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PREFACE TO THE AMERICAN EDITION 


THIS BOOK is a translation and adaptation of one by the outstand- 
ing Russian mathematician A. Ya. Khinchin, whose original contri- 
butions to mathematical analysis, the theory of numbers, probability 
theory, and statistics have left a mark on modern mathematics. 
The excerpt given below from the preface to the Russian edition 
explains the origin and aims of this book in his own words. 


We frequently encounter a situation in which an engineer, teacher, or economist 
who has at some time studied higher mathematics in a “simplified” course begins to 
fee] the need for a broader and, what is more important, a more solid foundation for 
his mathematical knowledge. This need, whether it arises out of specific research by 
the specialist in his own scientific field or comes as an inevitable consequence of the 
general widening of his scientific and cultural horizons, must of course be satisfied. It 
might be supposed that the specialist might easily satisfy his need; he could merely 
take any comprehensive text on mathematical analysis and study it systematically, mak- 
ing use of the rudimentary knowledge he has already acquired. However, experience 
shows that this method, which seems so natural, almost never leads to the desired goal 
but instead often brings disillusionment and a consequent paralysis of any further 
effort. For such a student usually has only limited time at his disposal and therefore 
cannot undertake to work systematically through a full-length textbook. On the other 
hand (and this is probably the most important factor) he does not yet have a firm ground- 
ing in mathematics, and therefore he cannot, without outside help, pick out the essen- 
tials. He will be compelled instead to devote his attention to irrelevant details, and in 
these he will finally get lost, unable to see the forest for the trees. 

And yet very little is needed to satisfy fully the needs of this student. A few years 
ago I had an opportunity to give a special course of lectures devoted to this purpose. 
The course consisted of only twelve two-hour lectures and was one of the series of courses 
offered by the University of Moscow to raise the mathematical qualifications of engi- 
neers. I must confess that at the beginning my task seemed to me to be almost hope- 
less; and yet I have reason to believe that my course, in spite of its brevity, did satisfy 
the needs of the audience. The secret of this success consisted in finding the right key 
to the pedagogical problem that faced me. I renounced from the very beginning any 
idea of presenting even a single topic in full detail: instead, I limited myself to a vivid 
and concrete presentation of the essential points and spoke more of goals and perspec- 
tives, of problems and methods, of the connections of the fundamental notions of analysis 
with each other and with their applications, than ofindividual theorems and their proofs. 
I did not hesitate, on numerous occasions, to refer my students to a text for details not 
of fundamental significance (sometimes even for entire sequences of theorems and 
proofs). But, in return, ] begrudged no time in elucidating concepts, methods, and ideas 
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that have a leading and essential significance and tried in every way and with the most 
varicd descriptions and images to impress these fundamental notions as vividly and 
effectively as possible upon the consciousness of my listeners. 1 have grounds for 
believing that after this preparation any student who felt the desire or nced for a deeper 
study of a particular topic in analysis was already independently able. first, to find the 
material he needed, and second, to approach its study economically, knowing how to 
distinguish the primary and basic from the secondary and nonessential. 

The many discussions I have had with individual students and groups have firmly 
convinced me that the path I chose was correct. In this connection, I would mention 
that the very large audiences that attended all of these lectures and the small number 
of drop-outs are the best proof of how widespread among engineers is the need for 
raising the level of their mathematical knowledge. 

This book has the same goal as the course of lectures J have just described and tres 
to realize it by the same means. The reader should therefore be warned from the very 
beginning that he will not find here a complete presentation of a university course in 
analysis, or even of individual topics selected from such a course. I have set myself the 
task only of giving a general sketch of the basic ideas, concepts, and methods of 
mathematical analysis. But I have tried 1o make this sketch as simple and as easy to 
retain as possible, to make it something that can be read and assimilated by anyone 
familiar with even the crudest exposition of the subject, and one which, once assimi- 
lated, should enable the student to study the details of any part of the subject independ- 
ently and effectively. 

At the same time, I hope that this book may also be of real benefit to many students in 
the mathematics departments of universities. Neither a text nor a lecturer, limited as 
they both are by the exigencies of time and the program, can pay enough attention to 
the discussion of fundamental questions; both are compelled to concentrate on 
the exposition of all the details of the material they cover. And yet everyone knows 
how useful it is sometimes to turn one’s eyes away from the trees and look at the forest. | 
would like to believe that this book will help to reveal that broader view to more than one 
future mathematician who is studying analysis for the first time. 


The Survey of Recent East European Mathematical Literature 
wishes to express its appreciation to Mrs. Irena Zygmund for trans- 
lating the book and to Messrs. Louis I. Gordon and C. Clark Kissinger 
for reading the manuscript and for their valuable suggestions. 
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1. The Continuum 


1. WHY BEGIN WITH THE CONTINUUM? 


DEFINITION. We shall call the variable y a function of the variable 
x if to each value of the variable x there corresponds a uniquely deter- 
mined value of y. 


This sentence is like a gateway leading into the domain of 
higher mathematics. With it we define the most important and 
basic concept of mathematical analysis, the concept of functional 
dependence.) In this concept there appears in embryo the prospect 
of mastering natural phenomena and technological processes by 
means of a mathematical apparatus. That is why we must require 
of this definition absolute clarity; not a single word should leave 
the shadow of a doubt. The least ambiguity in the definition would 
threaten the entire edifice constructed upon this fundamental con- 
cept and would necessitate a complete reconstruction. 

Nevertheless, it turns out on closer inspection that this concise 
formulation with which we began is in many respects incomplete 
and open to various interpretations. We shall dwell only on one 
such doubtful point here, and in clarifying its content we shall be led 
directly to the topic of our first lecture. 

Our definition contains the words to each value of the variable x. 
If there is to be no ambiguity, we must clarify the meaning of the 
term value of the variable. But this is not enough. Our definition 
speaks of each value. It follows that for a function to be defined, it 
is not enough to know some individual values of the variable x to 
which there corresponds a y. We must know the whole collection 
of values to each of which there corresponds a definite value of y. 
In other words, we must know what is called in analysis the domain 
of definition (or simply the domain) of the given function. The set 
of corresponding values of y is called the range of the function. 


1 A detailed discussion of various points of view regarding functions is given in Lecture 3. 


What can we say about the particular values of a variable? We 
know that they are numbers, and so the set of these values is a set 
of numbers. But what is this set and what numbers does it contain? 
From the very beginning we shall exclude the complex numbers 
from consideration and assume that all values of x and y are real 
numbers. 

But can all real numbers serve as values of the quantity x, and if 
not, which can and which cannot? Nothing was said about this in 
our definition. But this is completely understandable because one 
cannot give the same answer to this question for all functions (and, 
as a matter of fact, not even for the same function in different 
problems). The domain of a function depends on the nature of the 
function as well as on the special problem in which the function is 
employed. One and the same function may have to be considered 
in different problems for different sets of values of the independent 
variable x. 

Examples of functions whose domain of definition is not the set 
of all real numbers are abundant. For instance, the function y = x! 
makes sense (at least within the scope of elementary mathematics) 
only for positive integers x; the function y = log x, only for x > 0; 
etc. One can also construct examples in which the natural domain 
of definition will be a number set of considerably more com- 
plicated structure. 

If, however, we ask ourselves what are the domains of defini- 
tion which occur most often in mathematical analysis, we shall 
have to say that in the overwhelming majority of cases the domain 
of a function is an interval (open or closed), that is, the set of all 
real numbers contained between two given numbers (with the 
given numbers either included or excluded). Sometimes this inter- 
val is represented by a half-line (for example, the set of all x > 0). 
A half-line obviously represents the set of all real numbers which 
are greater (or less) than a certain given number (while the condi- 
tion > or < is sometimes replaced by > or <). Finally, there are 
cases in which the interval is represented by the whole straight 
line; that is, all real numbers may serve as values of the variable x. 
We then say that the domain of definition of the function is repre- 
sented by the whole real axis, or the number line. 

In any case, we see that in mathematical analysis the environ- 
ment in which functions exist and unfold their individual charac- 
teristics is the set of all real numbers. This set is called in mathe- 
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matics the continuun (or, more precisely, the linear continuum). 
And just as the careful gardener examines the soil before planting, 
we shall examine the environment in which this development 
of mathematical analysis from the concept of functional depend- 
ence is to take place. This is why the continuum is the first topic of 
study in any serious and well-constructed course in mathematical 
analysis. Only after the nature of the continuum has been sufficiently 
clarified can we go on to study the concept of functional depend- 
ence. And the structure of the continuum turns out to be not so 
simple as it might appear at first glance. The world of real num- 
bers will unfold before our eyes as a complicated structure abound- 
ing in the most diverse details whose investigation cannot be 
considered complete even to this day. 


2. NEED FOR A THEORY OF REAL NUMBERS 


Why is it impossible to study the continuum before construct- 
ing a complete theory of real numbers? What, then, is the con- 
tinuum? What real numbers exist? And when and how can we be 
sure that we have actually included al/ the real numbers in our 
theory? 

In elementary algebra we have at our command the set of all 
rational numbers (all integers and fractions, positive, negative, and 
zero). But very soon we begin to notice that these numbers are in- 
sufficient. For example, among the rational numbers there is no 
number \/2; that is, there is no rational number whose square is 
equal to the number 2. But why is it necessary to have such a 
number? It would be needed if only to represent the length of the 
diagonal of a square whose side equals ]. Consequently, if we were 
to forego the existence of such a number, we would have to recon- 
cile ourselves to the fact that the lengths of certain segments which 
arise so naturally and simply in geometry would not be expressible 
by any number. It is clear that metric geometry could not develop 
on such foundations. This means that \/2 must find a place among 
the real numbers. But as it does not appear among the rational 
numbers, we call this number irrational. However, the \/2 is by no 
means satisfied with the mere recognition of its existence: it im- 


1In some contexts in higher mathematics, the term continuum is reserved for compact 
connected sets. Thus the student may sometimes see a closed interval, for example 
[0,1], referred to as a continuum. 


mediately demands, first, that we assign to it a definite place 
among the rational numbers, that is, indicate precisely which 
rational numbers are less than \/2 and which are greater, and sec- 
ond, that we learn to perform the fundamental operations with it. 
For example, \/2 > 1 because the diagonal of the square is greater 
than its side, and the sum of the side and the diagonal of the 
square is equal to 1 + \/2. Thus we are compelled to assign a 
meaning to the number | + 1/2 also, as it is not rational, and we 
must also include this number in the set of real numbers. These de- 
mands of the new number are well-founded and justified, and if at 
this moment we do not respond to them, it is only because we are 
about to introduce into our system many other new numbers. All 
of these without exception will present to us the same demands, 
and it will be simpler to satisfy all of them simultaneously than to 
treat in detail each new number separately. 

Following the number /2, all (positive and negative) square 
roots of positive rational numbers enter our system in a natural 
and unavoidable way, then the cube roots, and finally all numbers 
of the form 
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m, (1) 
where r is any positive rational number and 7 is any integer greater 
than 1. 

As we know, however, the matter does not end here. Needs just 
as concrete as the case of the diagonal of a square force us in nu- 
merous other instances to introduce new numbers in the form of 
roots of algebraic equations. This occurs whenever a given equa- 
tion does not have roots among the numbers which we have 
already introduced, and yet we cannot deny the existence of these 
roots without depriving ourselves of a numerical description for 
some concrete physical entity. 

Now let us go all the way in this direction. We denote by the 
term algebraic number any (real) root of an equation of the form 
P(x) = 0, where P(x) is any polynomial with integral coefficients, 
and we introduce into our system all real algebraic numbers. In 
particular, we thus introduce all numbers of the form (1) above 


1, 
since the number ris defined as a root of the equation gx” — p = 0, 
Pp 


where =r is a representation of the rational number r in the 
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form of a simple fraction. As a still more special case, every rational 
number r = - also belongs to the set of algebraic numbers, since it 


is the root of the equation gx — p = 0. 

It is very easy to order this system of algebraic numbers, that is, 
to formulate a rule allowing us to determine which of two arbitrary 
algebraic numbers is the greater and which is the lesser. It is some- 
what more difficult, although still not too complicated, to formu- 
late rules for performing the ordinary algebraic operations on 
these numbers and to show that the results of these operations are 
again algebraic numbers, so that, and this is a very important 
point, algebraic operations with algebraic numbers always lead to 
algebraic numbers and therefore can never require the introduction 
of new numbers. 

Can we perhaps stop here and consider the construction of the 
system of real numbers finished? Can we now recognize the set of 
all algebraic numbers as the continuum? We know well that we 
cannot do that, and we know why we cannot. Although the num- 
bers we have so far introduced are sufficient for many algebraic 
theories, it is precisely for analysis that they fail. In its first steps, 
mathematical analysis adds to the elementary operations of algebra 
the fundamental and most important operation of passage to the 
limit. A number of cases exist in which concrete considerations force 
us to recognize the existence of the limit of this or that sequence of 
numbers. What is more, this limit appears to us as a number which 
has a definite and real meaning, a number on which, in turn, we 
should like to perform algebraic and analytic operations. 

If for every sequence of algebraic numbers to which we found 
it desirable to assign a definite limit, such a limit actually existed 
within the domain of algebraic numbers, then we could freely ac- 
cept this domain as being indeed the continuum and conclude that 
mathematical analysis needs no real numbers other than the 
algebraic numbers. But this is not the situation. If we take a unit 
circle and begin to inscribe regular polygons in it, starting with a 
triangle or a square and proceeding by doubling the number of 
sides indefinitely, the perimeters of all such polygons are algebraic 
numbers and the limit of this numerical sequence is called the cir- 
cumference of the circle. To admit that this limit does not exist 
would forbid geometry to speak of the circumference of a circle. 
We can easily imagine the loss caused by such a restriction not 
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only to geometry but also to all other sciences that make use of the 
concept of a circle. 

Yet it is possible to prove that among the algebraic num- 
bers such a limit does not exist. What then is the solution to this 
impasse? It is clear: we must recognize that for the purposes of 
mathematical analysis, algebraic numbers alone do not suffice, and 
that it is necessary to adjoin to them real numbers of a new kind. 
All such nonalgebraic real numbers are called transcendental. We 
denote the number constructed above (the circumference of the 
unit circle) by 27; thus 7 is a transcendental number. Another im- 
portant example of a transcendental number is the familiar 
e = 2.718..., which, as you know, is generated by a simple pas- 
sage to the limit of a sequence of rational numbers. The transcend- 
ence of the numbers e and z was established quite late, in fact not 
until the second half of the last century. However, the necessity of 
introducing transcendental numbers was established somewhat 
earlier, in the middle of the last century, by the French mathema- 
tician Liouville on the basis of other examples which were simpler 
but also less important. 

Thus the construction of our continuum is not yet finished. 
How, then, are we to proceed? For example, can we stop here and 
say, “The continuum is the set of all algebraic numbers to which, 
as the need arises, we adjoin other numbers (called transcendental) 
which we obtain (as in the case of e and 7) from algebraic num- 
bers by passage to the limit?” We ask this question because it is on 
just such a basis, though not made explicit, that the majority 
of simplified courses on mathematical analysis are constructed, 
courses which avoid the exposition of a complete theory of irrational 
numbers. But the answer to our question is, of course, no. We can- 
not stop at the point at which we now find ourselves, and for 
numerous quite simple and convincing reasons. 

First of all, the continuum, as the fofality of real numbers, should 
be defined once and for all as a given fixed set (on the pattern of 
the definition cited above for the set of all algebraic numbers) 
without leaving open any possibility of subsequently adding more 
numbers to it. Further, the words “as the need arises” in our pro- 
visional definition obviously have no precise meaning. If we have 
a sequence of algebraic numbers which does not have an algebraic 
limit, and if the question arises whether to assign to it a tran- 
scendental limit or to treat it as a sequence without a limit, we 
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have the right, from a formal point of view, to follow our own 
judgment. To make the decision, guided not by formal consider- 
ations but by concrete and real ones, no matter how important, is 
to reject the concept of a mathematical definition. We can refuse 
formal existence to the number z. In this particular case, it would 
be highly inconvenient, but in other cases such a refusal might 
lead to no inconvenience at all. It is clear that a criterion which 
would force us to introduce a transcendental number whenever 
it would be awkward to get along without it is not, nor can it by any 
reformulation become, a mathematical criterion. Finally, it is not 
at all certain that the numbers introduced in this manner would be 
sufficient. For we may have to add the new numbers, multiply 
them, and make them pass to a limit (for mathematical analysis 
has no use for numbers which do not permit such operations). 
How can we be sure that the results of all such operations will be 
real numbers belonging to our continuum? For if not, then we 
would again have to add new numbers to our set and thus our 
continuum would still not contain all the real numbers. 

We see then that the position we hypothetically took above is 
untenable. We cannot construct one or two transcendental num- 
bers as examples and say “and so on,” and let it go at that. For by 
this procedure, nothing is really defined at all. 

So we see that we cannot lay a solid foundation for mathemat- 
ical analysis without a general theory of real numbers; a theory 
not limited to individual constructions of new numbers, but con- 
taining the general principle for all such constructions and yielding 
in one stroke the whole set of real numbers. 


3. CONSTRUCTION OF THE IRRATIONAL NUMBERS 


There exist in mathematics a number of different theories for 
the continuum. However, all these theories, it 1s important to 
remember, approach the problem in conceptually identical manners. 
In comparison with this essential identity, those details in which 
they differ are as the structural details of a building compared 
with its overall architectural plan. 

Once the rational numbers are given, all these theories aim at 
obtaining from them, at one stroke, the entire set of real numbers 
by means of a single principle of construction. And in the different 
theories this principle takes different forms, but the similarity of these 
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theories is not limited to having a single principle of construction. 
All principles which lead to the construction of the new (irrational) 
numbers are, in spite of considerable formal differences, based on 
the same idea in all the theories. This idea is the fundamental 
analytical operation of passage to the limit; all known methods are 
reducible to it and may be considered as different forms of it. For 
example, you know that square roots of natural numbers can be 
realized as limits of appropriately selected sequences of rational 
numbers (approximate square roots). This is true in other cases as 
well. 

In view of the foregoing, to obtain a full understanding of the 
system of real numbers one need not examine all the different 
theories in detail. It will be sufficient to take as an example any one 
of them, as everything of fundamental importance which we thus 
discover applies equally well in all the other theories. We choose to 
treat the theory of Dedekind, not because it has any essential ad- 
vantages over others, but solely for the practical reason that this 
theory is adopted in a majority of the most widely used textbooks. 
Thus you will have no difficulty in obtaining a textbook where you 
may follow the details omitted from our exposition. 

Before we introduce irrational numbers, we must examine a 
little more carefully the set of rational numbers (which we denote 
by R). First, let us note a very elementary property of this set: 
between any two rational numbers r, and rz there always exists a 
third rational number. We can see this most easily by noting that 
Eyre 19 

2 
ber with the desired property. Repeated application of this fact at 
once establishes that between r; and rz there are infinitely many 
rational numbers. 

We shall now examine carefully the situation arising from our 
attempts to find or to define \/2. (By this symbol we mean 
the positive square root.) First we look among the rational num- 
bers (any others do not exist for us at the moment) for a number 
whose square would equal 2. We can easily establish that such a 
rational number does not exist, but we shall not give here the fa- 
miliar proof of this fact. Thus, if we choose any rational number r, 
then we shall have either r2 < 2 or r? > 2. 

Now let us consider only the positive rational numbers. They 
fall naturally into two classes: class A consisting of all the positive 


the arithmetic mean of r; and re (that is, ) is a rational num- 
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rationals r; such that r;2 < 2, and class B consisting of all the pos- 
itive rationals rp such that re? > 2. Since r; and re are positive, it 
follows from the inequalities m2 < 2 < rs? that ry < ro, that is, thar 
every number of class A is smaller than every number of class B. \t is 
evident that if we now adjoin to class A the number zero and all 
the negative rational numbers, then we shall have a separation of 
the whole set R into two classes, A and B, where every number 
of class A will be smaller than every number of class B. By 
the term cur (or more precisely, a cur in the set R), we shall mean 
any division of the set R into two nonempty! classes for which the 
above condition holds (the condition that A < B). Thus the sepa- 
ration of the set R into the class B, consisting of all positive 
rationals whose square is greater than two, and the class A, con- 
sisting of all rationals not in B, determines a cut in R. 

We can construct cuts in R in widely differing ways, some quite 
elementary. For example, by including in class A all rational num- 
bers 7; <5 and in the class B all rational numbers rz > 5, we 
clearly obtain a cut in R. If we represent numbers in the usual 
manner by points on a Straight line, then, of course, every cut will 
be represented by a separation of the (rational) points of the line 
into two sets, the first of which is situated entirely to the left of the 
other set. 

At first glance it may appear that all cuts in R are essentially 
the same, that two different cuts differ only by the location at 
which they are made, and that, because of this, one can be 
converted into another by a simple translation. It is extremely im- 
portant to realize that this notion is wrong, and that in the very 
structure of the cuts there may occur profound, and for our pur- 
poses fundamental, differences. 

Notice that the cut in our last example has the property that 
there exists a number (a rational number, since we do not as yet 
have any others) such that all numbers less than it belong to class 
A, and all numbers greater than it belong to class B; in our example 
the number 5 is obviously such a number. We shall call the number 
which has this property the edge of the given cut. Thus the cut in 
our last example has an edge. 

On the contrary, in our first example (concerning \/2) there is 
no edge. We shall now prove this. Let us assume that there exists a 
rational number r which is an edge. Then we must have either 


1A set is called nonempty if it contains at least one clement. 


r2 <2 or r2 > 2. To be definite, let us suppose that r? < 2. Since r 
is an edge, it follows that for each r’ > r we must have r’2 > 2. 

If r< 1, then by taking r’ = 1, we arrive immediately at a con- 
tradiction. On the other hand, if r > 1, then r? > r. By taking 


2—P=c>Oand/ =r +7, we have 


which again leads to a contradiction, since > r. 

Thus all the cuts in R can be classified into two types: those that 
have an edge and those that do not. Moreover, we should keep in 
mind the following: 


(a) A cut cannot have two edges, for if r and /’ were both edges 
of a cut andr <r’, then by virtue of the discussion on page 
8, there would exist a number r” such that r<r’ <r. 
But since r is an edge and r’ >,r, it would follow that 
r’ € B.| At the same time, from the fact that /’ is an edge 
and r’ <r, it would follow that r’ € A, which is 
contradictory. 

(b) The edge of a cut, if it exists, is either the greatest number 
of the class A or the least number of the class B; if, however, 
there is no edge, then there exists neither a greatest number 
of class A nor a least number of class B. 


(c) Every rational number ro is the edge of two different cuts. 
In one of them the class A consists of all numbers r < ro (and 
the class B of all numbers r’ > ro), and in the other the class 
A consists of all numbers r < ro (while the class B consists 
of all numbers r’ > ro). 


(d) The classification of the totality of all cuts in R into two 
types, cuts which have an edge and cuts which do not, is, 
of course, an intrinsic structural property of the set R; this 
would still exist even if we were not at all interested in 
introducing nonrational numbers. 


At this point, the example involving \/2 suggests our subsequent 
course of action. The intuitive picture is clear: we see before us the 
number line (the axis of real numbers) cut into two parts at a point 
to which there corresponds no rational number. To renounce the 


1r’”’ € B means that r” is an element of the set B. 
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existence of such a point would intuitively imply a gap in the num- 
ber line and the continuum would lose its continuity (its solid and 
gap-free character), that very characteristic to which the continuum 
owes its name. And from the practical point of view, as we have 
already stated, all applied sciences (and first of all geometry) 
would be subjected to a very considerable inconvenience if we 
were to accept the lack of an edge between our two classes. There- 
fore, stimulated by the demands of our intuitive perception as well 
as by very serious practical considerations, we introduce into our 
system of numbers a new number \/2, which we define to serve as 
the edge of this cut. Such numbers we call irrational numbers. 

But the particular cut we have selected does not differ in 
principle from any other cut in R of the same type, that is, a cut 
which has no (rational) edge. Therefore, in constructing the general 
theory, we extend our definition in a natural way to any cut of this 
type. To every cut in R with no rational edge we assign a new, 
irrational number, which by definition will be its edge. 

In this way, by means of a single principle we define at once the 
whole set of irrational numbers. Together with the previously 
existing set of rational numbers they form the set of all real numbers, 
or the continuum, which is now completely defined. 


4. THEORY OF THE CONTINUUM 


The principle of construction of the irrational numbers that we 
have introduced by no means exhausts the theory of the continuum; 
on the contrary, the latter really only begins here. The program of 
development which must be effected before we can speak of a 
complete theory of the continuum is still quite extensive. 

First of all, we have to order our continuum, that is, determine 
precisely under what conditions we shall consider one given real 
number as greater or less than another. Further, we have to define 
operations on the real numbers, for so far we haven’t the slightest 
idea, for example, what is meant by the sum of the numbers | and 
\/2. And then we have to check carefully that these operations have 
the same properties to which we are accustomed in the domain of 
rational numbers. For example, the invariance of a sum when the 
order of the summands is changed (the commutative law of addi- 
tion) is a theorem which we have to prove again for real numbers. 
Finally, we have to verify that the continuum really satisfies all 


1] 


those requirements of practice and of our intuitive perception for 
the sake of which it was constructed: continuity. 

It is obvious that this program cannot be carried out in complete 
detail within the framework of these lectures; in any case it would 
be extremely tedious. In what follows we shall touch upon only 
some of its most important features. 

To begin with, it is very easy to order our continuum. Suppose 
that we are given two real numbers a; and a2 and we wish to 
establish which one is greater than the other. If both numbers are 
rational, this problem finds its solution in arithmetic, about which 
we assume full knowledge. If a, is irrational and ag is rational, 
then the problem is solved at once: the number a, is the edge of a 
certain cut in R; in accordance with the definition of a cut we shall 
say that ag < a1 OF a2 > ay, depending on whether the rational 
number a belongs to class A or to class B of this cut. Suppose, 
finally, that the numbers a, and ag are both irrational. Since these 
numbers are not identical, the two cuts corresponding to them are 
different, and hence the lower classes A; and A» of these cuts are 
different. This means that one of these sets, say Az, contains a 
rational number r which is not in Ay. From r € Ag it follows that 
r <a, and from! r ¢ A, it follows that r € By; hence r > ay. Thus 
there exists a rational number r such that ay <r < ay. When this 
is the case, we shall say that a1 < a2. On the other hand, if we had 
found an /’ such that ag <r’ < ay, then we would say a2 < ay. 
One of these two conditions must exist as we have just shown, and 
this completes the ordering of the continuum. 

However, we have finished only with the definition of the 
ordering; its properties are yet to be established. We have to show 
that a, < ag is incompatible with a; > ag and that from the 
inequalities a, < ag and a2 < az, it follows that ay < a3. In short, 
we have yet to show that the inequalities between real numbers 
obey the same basic laws as the inequalities between rational 
numbers. But you should have no trouble in proving these 
propositions. 

Among other things, the foregoing argument shows that between 
any two irrational numbers there exists a rational number. We have 
seen previously that the same is true if both given numbers are 
rational. Suppose now that r is rational and a is irrational, and 


1¢ means “is not an element of.” 
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suppose, for definiteness, that r < a. We shall show that in this case 
also there are rational numbers between r and a. The number a is 
the edge of a cut (A, B) in R, and fromr < a it follows that r € A. 
But in a cut having an irrational edge the class A does not have a 
greatest number. Hence there exists a rational number r’ > r which 
belongs to the class A and consequently is less than a. Thus we have 
r<or<a. 

It follows that between any two real numbers there is an infinite 
set of rational numbers. We express this important property of the 
set R of all rational numbers by saying that R is everywhere dense 
(in the continuum). It is easy to show that the set of irrational 
numbers is also everywhere dense. We need only consider all 
numbers of the form r\/2 where r runs through the set of all rational 
numbers; all such numbers are irrational and they alone already 
form an everywhere dense set. Strictly speaking, the expression 
ry/2 will acquire a precise meaning only after the operations on real 
numbers have been defined, and so we shall take up this question 
at once. 

We have no reason to consider this problem in complete detail 
because the method for constructing definitions of the fundamental 
operations with real numbers will become clear if we carefully 
examine only one example. Let a; and a2 be two real numbers; we 
wish to define their sum a; + a. Regardless of whether the 
numbers a; and ay are rational or irrational they are both edges of 
certain cuts in R, which we denote by (Aj, Bi) and (Ag, Be) respec- 
tively. Further, let a1, 51, a2, b2 denote arbitrary numbers belonging 
to the sets A1, By, Ae, Be respectively. It is evident that every num- 
ber of the form a, + ae is less than every number of the form 
b, + ba. We shall now show that there exists one and only one real 
number a such that for arbitrary aj, by, de, b2 (belonging. of course, 
to the corresponding sets) we have the inequalities 


a4, +a2<a< bd, + bo. 


We shall then naturally define the sum of a and a to be equal to 
a. The existence and uniqueness of the sum will then follow from 
the existence and uniqueness of the number a. 

Let us consider the cut (A, B) in R defined as follows. If a 
rational number r is smaller than all numbers of the form 5; 4+ bo, 
then we shall put it in class A; otherwise, in class B. It is easy to 
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see that the division of R so determined is actually a cut. We denote 
by a the edge of this cut. It is evident that all numbers of the form 
a, + dz belong to the class A, while all numbers of the form 
by + be belong to the class B. Therefore 


ay + ag Sa < by + by; (2) 


that is, the number a satisfies the stipulated conditions. We have yet 
to establish its uniqueness. 

For this purpose we must first make sure that it is possible to 
choose the numbers a, and 5, so that their difference b; — a1 
is arbitrarily small. To show this, let a be any rational number of 
class A; and let c be an arbitrarily small positive rational number. 
In the sequence of rational numbers 


a, atc a+2c, ..., a+ne, 


the first term belongs to class A,, and, in general, so do several more 
terms. But since a + nc increases indefinitely with n, beginning at 
a certain place all the following terms will belong to class By. 
Therefore, there exists an integer & such that 


at+kec=a,€Ay, a+(k+ )Ic=h€ By, 


while b} — a, = c. Hence, a; and b; may be chosen so that 
the difference between them is arbitrarily small. Similarly, a2 and 
bo, and hence the numbers a; + de and b; + be, can be selected in 
such a way as to be arbitrarily close to each other. 

Let us now suppose that there exist two real numbers a and a’ 
satisfying all inequalities of the form (2) and suppose, for definite- 
ness, that a < a’. As we know, there exists a pair of rational num- 
bers r, r’ such that 


Oxf <r’ <a’, 


and together with the inequalities (2) these relations indicate that 
any number of the form a, + a2 differs from any number of 
the form 6; + bg by a quantity exceeding the constant r’ — r. 
However, this is contradictory to what was shown above. In this 
way the uniqueness of a is demonstrated. 

The above definition of addition is convenient in that it permits 
us to extend at once all the basic laws governing the addition 
of rational numbers to the addition of arbitrary real numbers. Try 
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to prove, for example, the commutative law and you will see how 
easily everything comes out. 

As we have already stated, we shall not dwell here on the defi- 
nitions of the other operations or on proofs of the laws governing 
them. We shall only mention that it is best to define multiplication 
analogously to addition, and then to define subtraction and division 
as the inverse operations. 

We shall now turn to the last important problem in this area: to 
show that the continuum that we have defined actually has conti- 
nuity, the gap-free nature it must have in order to serve as the 
basis of mathematical analysis, and whose absence in the set R of 
rationals compelled us to introduce irrational numbers. 

In order to answer this question, let us recall what led us 
to speak of the absence of such continuity in R. It was that 
among the cuts in R there occurred some which did not have an 
edge belonging to the set R. Thus, if we can show that for the set 
of real numbers such a thing cannot happen, that is, that every cut 
in the set of real numbers has an edge in the real numbers, we may 
consider our task completed and be assured that the continuum 
which we have constructed satisfies the requirements imposed on it. 

To avoid any misunderstanding, let us observe that the cuts 
mentioned in the italicized statement above are not the same as the 
Dedekind cuts which were used to define real numbers. Previously, 
we have always spoken of cuts in the set R of rational numbers, 
and now for the first time we speak of a cut in the set of real num- 
bers, or the continuum. However, the formal definition of a cut is 
unchanged. 

We shall now prove the existence of our desired edge. Let (A, B) 
be an arbitrary cut in the continuum. Every rational number (as 
indeed every real number) belongs either to class A or to class B, 
in this way the cut (A, B) in the continuum induces a cut (A’, B’) 
in R. Let a be the real number constituting the edge of the cut (4’, 
B’). We shall show that a is also the edge of the cut (A, B) and the 
proof of our assertion will be complete. 

We shall have to show that every real number a; < a belongs 
to class A and every real number az >a belongs to class B, 
by symmetry it is enough to prove the first part of this statement. 
Let r be a rational number situated between a; and a. Since r <a, 
it follows that r € A’ C A,! and since ay < r, it follows that a, € A. 


1The notation 4’ C A means that the set 4’ is contained in (is a subset of) the set A. 
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5. FUNDAMENTAL LEMMAS OF THE REAL NUMBER SYSTEM 


With the definition of real numbers, we have laid the founda- 
tion for mathematical analysis. In building the principles of analysis 
on this foundation, we shall, of course, have to refer frequently to 
this basic definition. This will entail some inconvenience, as the con- 
struction and investigation of the necessary cuts is usually rather 
cumbersome. 

The way in which mathematics finds a solution to this difficulty 
is Most instructive, since it may be considered typical of all logical 
situations of this kind which are frequently encountered in the 
mathematical sciences. In the process of developing mathematical 
analysis it is noticed that although the direct application of the 
definition of real numbers appears rather frequently in the reason- 
ing, many of these applications are very similar in form to one 
another. Actually almost all of these applications follow one 
of three or four formal patterns (but with a different content each 
time, of course). Given such a situation, it would obviously be very 
uneconomical and would make the development and mastery of 
the given branch of science much more difficult. if the same 
logical constructions had to be worked out dozens of times anew, 
changing only their specific subject matter each time. 

With complete justification, mathematics long ago acquired the 
habit, in situations of this kind, of formulating such recurring logi- 
cal patterns as auxiliary propositions, or lemmas. Once such a 
lemma is proved, it becomes unnecessary to repeat on each occa- 
sion the formal construction supporting it; one need merely cite the 
lemma. In our case, after having proved three or four such auxiliary 
propositions, we shall in the future almost never have to return to 
the construction of cuts. We shall be able to replace this construc- 
tion each time by a reference to one of the principal lemmas, 
which form, so to speak, a number of small bridges connecting 
mathematical analysis with its logical foundations. It goes without 
saying that the choice of these basic lemmas may vary in different 
expositions. However, in every case we can counsel the reader not 
to begrudge time and effort spent in mastering a great number of 
such lemmas. The purpose of each one of them is to lighten work in 
the future and the effort spent will not be wasted. 

We shall now illustrate with a few examples the formulation 
and proof of such lemmas. 
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A sequence of real numbers. 
ro a) cnn; ener (3) 


is called sonoronic if either ay < an+1 for all n. or ay > Qn+1 for all 
n. In the first case we speak of a monotonically nondecreasing 
sequence. and in the second of a monotonically nonincreasing 
sequence. The sequence (3) is called a bounded sequence if there 
exists a number c such that 


lan | <u (Caw eee 


LeMMA 1. Every monotonic bounded sequence of real numbers 
has a limit. 


Proof. Suppose for definiteness that 
Qn < Qs, and la,| <c (a eee 


Let us divide the continuum into two sets A and B. including in B 
every real number which is greater than al// the a, (in particular. 
the number c belongs to the set B) and including in the set A all 
remaining numbers (in particular. all the numbers a,). It is evident 
that such a division of the continuum is a cut. Letting a be 
the edge of this cut. we shall show that? lim a, = a. which will 
prove Lemma 1. a 

First of all, we note that for any n we have a, < a. For if we 
had a, > a. then from the definition of a cut we would have a, € B. 
which contradicts the definition of B. If we now assume. contrary 
to our assertion, that a is not the limit of the sequence (3). then 
there must exist a positive constant e such that for an infinite set of 
the numbers n, we have the inequality 


A — An > €. 


whence a, <a—e. But because of the monotonicity of the 
sequence. if this inequality is satisfied for an infinite set of values 
of n, then it must be satisfied for a/l n. By virtue of the definition 
of the set B. it follows that a — e € B. while from a — € < a. it fol- 
lows that a — e € A (since a is the edge of the cut (A. B)). 
This contradiction proves our lemma. 


It should be noticed that monotonicity cannot be omitted from 
the hypothesis of the lemma. The existence of the limit does not fol- 


1Here the limit is for n = 1, 2..... The theory of limits is discussed in detail in 
Lecture 2. 


17 


low from boundedness alone, as can be seen, for example, in the 
case of the sequence a, = (—1)”. 

The lemma on monotonic sequences finds a number of applica- 
tions not only in analysis but also in elementary mathematics. In 
this latter field we often introduce it as an axiom, seldom taking 
into account that it is simply not true unless the number system 
under consideration includes a// the real numbers. This defect 
can be observed not only in elementary mathematics but also in 
“simplified” courses in mathematical analysis. 

We now observe that the existence of the circumference of 
a circle (the limit of the perimeters of inscribed regular polygons as 
the number of sides increases indefinitely) and the existence of the 

n 
number e = lim (1 +4) are proved most simply by using this 
nox 
lemma. 

Besides Lemma 1, stating the basic property of bounded mono- 
tonic sequences, there is an analogous proposition of no less sig- 
nificance concerning monotonic functions of a continuously chang- 
ing variable. 


LemMa |’. If the variable x tends toward a from the left and if 
f(x) is bounded and monotonic in an interval whose right-hand end 
point is the point a, then f(x) tends to a limit as x approaches a. 


The boundedness of the function f(x) within an interval whose 
end points are a — e and a (where e > 0) implies the existence of a 
number c such that | f(x)| <¢ fora —e <x <a, and its mono- 
tonicity implies that the ratio 


(x1) — f(%2) 


X1 — Xo 


is either > 0 for any pair of numbers x1, X2, (x1 4 X2) belonging to 
this interval, or it is < 0 for any such pair. 


Proof. The proof of Lemma I’ may be carried out very simply 
by using Lemma |. To be definite, suppose that f(x) is nondecreas- 


ing; that is, f(x1) < f(x2) fora — € << xy < Xe < a. Since forn > 1 
E€ 


we have 


eee gy ee ae 
n 


. : ] 1\. . ee . 
the increasing sequence a — aan e =) 3s contained within the in- 
terval whose end points are a — e and a and has as its limit 

. 1 . 
the number a. The corresponding sequence fa — +) is. of course, 
n 


bounded and nondecreasing. Thus, by Lemma | there exists a limit: 


lim f(a — +) = b. 


Nx nN 


If the number x is sufficiently close to a, then we can find an in- 
teger n = n(x) such that 





ee Teg ee ee | ; 
na 7 n+] 


Since the function f is monotonic. 


f(a-4) <p <fa-—), (4) 


n+] 





But as x > a it is evident that n > x, and consequently the left 
and the right sides of the inequalities (4) have b as their common 
limit. Hence f(x) — b as x > a, and Lemma I’ is thus established. 


We shall denote by [a, b] the closed interval with end points a 
and 6 (the set of all numbers x satisfying the inequalities 
a<x <b), and by (a, b) the open interval with the same end 
points (all x such that a< x <b). We shall call a sequence of 


intervals 
(a1, by], [ao. be]... .. (iis Opals <'n (5) 


a sequence of nested intervals if this sequence satisfies the following 
two conditions: 


(a) Gn Gag < Day <b, (= 1,234.4) 3. each successive in- 
terval is entirely contained in the preceding one: 


(b) lim (6, — an) = O; the lengths of the intervals tend to zero 
nh—-x 
as their indices increase indefinitely. 


Lemma 2. If the sequence (5) is a sequence of nested intervals, 
then there exists one and only one real number belonging to all these 
intervals. 


It would. of course, be possible to carry out the proof of 
this very useful lemma by constructing an appropriate cut: how- 
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ever, it will be much simpler to employ Lemma |. The proof runs 
as follows: 


Proof. By virtue of condition (a), the sequence 
Oi 005. +5 Ong sx 


is monotonic and bounded (this latter fact follows from a, < by for 
all n), and thus by Lemma | it has a limit. Let us set 


lim ad, = a. 
Since for any positive ieperk we have the inequality 
Oy Oy (= 123); 
it follows that 
a < by (ee 1g 2. kway 
so that 
an Sa < by GH 2nd: (6) 


Thus, the number a is contained in all the intervals [a,, 5,]. More- 
over, there can be only one such number. For if there existed two 
numbers « and f satisfying the inequalities (6), then (assuming that 
a < f) we would have 


an < a = B < D,, (n 
and, consequently, 
by, — an > B—a Gt Ti 2. fa 


But this contradicts property (5) of a sequence of nested intervals, 
and thus establishes Lemma 2. 


1 2ivua)s 


It should be remarked that in the hypothesis of Lemma 2 it is 
essential to include in every interval its end points. If we omitted 
this condition and instead of [a,, b,] we considered the open inter- 
vals (an, by), then our lemma would be false. For example. the 


sequence of open intervals (0. +) ic) aa / a Oe ear aeae has no point 


common to all the intervals. 
The following lemma of more recent origin also serves fre- 
quently as a very convenient tool in proving theorems in analysis. 
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We shall say that the family M (generally infinite) of intervals 
covers a (closed) interval [a. b] if every point of the latter lies in the 
interior of at least one interval of the family M. 


LemMMA 3 (Heine-Borel). If a family M of intervals covers the 
closed interval [a, 6}. then it is possible to select from this family a 
finite subfamily M' which also covers the interval [a. b] 


Proof. \f the interval A, = [a. 6] cannot be covered by any such 
finite subfamily of intervals, then dividing it into two equal parts 
we can assert that at least one of them cannot be covered by 
a finite subfamily (if both halves could be covered by finite 
subfamilies, then the whole interval could be so covered). Let us 
denote this half by A» (if neither half can be covered by a finite sub- 
family then let \, denote the one on the left) and divide it again 
into two equal parts. We assert again that at least one of them (let 
us denote it by 13) cannot be covered by a finite subfamily of V7. 
We can repeat this process indefinitely and thus form a sequence 
of nested intervals. \,. de. A3..... Ano... 

By Lemma 2. there exists one and only one point a which be- 
longs to all these intervals. Let A be an interval belonging to the 
family M which contains a as an interior point. Since the length of 
the interval \, tends to zero as n—> x~, and for each n we have 
a € A,, it follows that A, C A, for some sufficiently large 7. 

We thus arrive at a contradiction: the interval \,, which by def- 
inition cannot be covered by any finite subfamily of intervals. 
is covered eventually by a single interval from M. This contradic- 
tion proves Lemma 3. 


We have now not only constructed a foundation for mathemat- 
ical analysis. but in having proved three very important lemmas. 
we have constructed so solid a foundation that the further develop- 
ment of the subject can be carried forward efficiently. The basic 
concepts and methods, the ideas and logical devices which are used 
in this development, you will learn in the succeeding lectures. 
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2. Limits 


6. WHAT IS A LIMIT? 


The concept of /imit is one of the most important in mathemat- 
ical analysis. You are, of course, familiar with many theorems on 
limits; nevertheless we shall now study this familiar concept 
thoroughly, both to make it more precise and to broaden it. 

Let us first of all consider the meaning of the following sentence: 
The variable x (in a given phenomenon or process) tends to a (or in 
other words, has the limit a), which we write symbolically in one of 
two ways: 


x—>a or limx=a. 


In attempting to make the definition of a limit satisfy all necessary 
formal requirements (and without this it is quite impossible to call 
mathematics a science), we meet at once a peculiar and character- 
istic difficulty. The fact is that in a precise definition of the concept 
of limit, we cannot admit such terms as phenomenon or process, 
whose mathematical sense is completely unclear. And yet, any for- 
mulation of the ordinary definition of a limit is very difficult with- 
out the use of these (or equivalent) terms. For we usually say: for 
any neighborhood! U of a, all values of the quantity x from some 
point on in the given process are contained in U. Or sometimes we 
say: no matter how small we take the positive number e, the quantity 
x — a becomes and remains smaller in absolute value than e in the 
course of the given process. We must now find some way to formulate 
this definition so that it contains no terms whose precise mathemat- 
ical meaning is questionable. Frankly, we have to say that it is 
doubtful whether this problem can be solved in any satisfactory 
manner. Modern mathematicians prefer to renounce its solu- 
tion entirely and to consider the notation lim x = a as devoid of 
any mathematical content. This does not mean, of course, that 
they are willing to renounce the concept of a limit. How then do 
they get out of this difficulty? 


!A neighborhood of the number a is any open interval containing this number. 
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The fact is that whenever the notion of limit occurs in analysis, 
we always meet a situation where a function y tends to a limit as 
the independent variable behaves in some definite manner. For 
example: y tends to b as x tends to a, or ay, tends to b as n increases 
indefinitely, in symbols, 


vob or lmy=b 
ra roa” 


a—>b or lima, = b. 
N20 Nn-x 

Such a sentence has in fact a very definite meaning. Thus, for ex- 
ample, the expression 

lim y = 5 

ra 
means the following: for every neighborhood V of b there exists 
a neighborhood U of a such that y € V whenever x € U, except per- 
haps when x = a. As you can see, this precise definition of a limit 
is entirely independent of the notion of a process.! It states simply 
that y is arbitrarily close to b provided x is sufficiently close to a. 
Of course, for a person not yet used to the typical characteristics 
of mathematical formalism this might appear strange. How is it 
possible that the sentence 


y approaches the limit b as x approaches the limit a (A) 


can have a definite meaning while the phrase x approaches the limit 
a by itself means nothing? Actually there is nothing unacceptable 
or even unusual in this situation. The sentence (A) must be regarded 
as something whole and monolithic, and it is not at all necessary 
that parts of it, taken separately, should make sense; it is only 
necessary that the sentence taken as a whole have a definite 
meaning. 

It will be seen at once that our definition of a limit includes the 
two cases most important to analysis: the limit of a sequence, 
lim a,, and the limit of a function of a continuously varying quan- 


nox 


tity, lim y. In the sentence (A) both a and b can represent either 
Ta 


real numbers or +0 or —©o. 


1Note that the concept of a limit at a is a property of the function about, but not nec- 
essarily at, a. Thus, y at x = a need not equal 5, nor need y even be defined at x = a. 
Further, the cases x — 00, x > —00, y— 00, and y—> —oo do not require special 
definitions if by a neighborhood of +0 (or of — 20) we mean the set of all numbers 
greater (or smaller) than some arbitrary number. 
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7. SOME WAYS OF TENDING TOWARD A LIMIT 


Let y = f(x) tend to b as x tends to a, where a and b are num- 
bers.! Let us assume, for simplicity, that all values of x are greater 
than a. Symbolically we express this by a convenient notation: 


x>-a+0 
(rather than x — a, x > a). Thus, we have 


lim, (x) = b. (1) 


@2at 


Let us first assume that for all x > a sufficiently close to a we 
have f(x) > b; then in the vicinity of a, f(x) has one of the two 
geometric representations shown in Figures | and 2. In Figure | 








Fig. | Fig. 2 


the closer x is to a the closer f(x) is to b, decreasing monotonically 
toward b as x >a +0. In Figure 2 everything is different: as 
x—a-+ 0 the function /(x) changes nonmonotonically, sometimes 
increasing, sometimes decreasing. Of course, these important dif- 
ferences are no obstacle to the analytic representation of both 
types of variation by the same formula (1), which describes the 
basic fact common to both: y is as close as we please to b, provided 
x is sufficiently close to a. 

The case where y < 4 for all x sufficiently close to and greater 
than a is completely analogous to the case above: it is depicted in 
Figures 3 and 4. 


1 That is, a and 6 represent finite real numbers in contradistinction to + 0c and — co. 
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Fig. 4 


Finally, it may happen that to the right of x = a and arbitrarily 
close to it = f(x) assumes values greater than ), as well as values 
less than b (Fig. 5). In this case the approach of y toward the limit 








Fig} 


bas x —a + 0 is, of course, not monotonic. If f(x) is continuous 
(which we tacitly assumed in all the figures), then it necessarily be- 
comes equal to its limiting value 5 at an infinite number of points 
situated to the right of and arbitrarily close to a. In geometric 
terms, the graph of » = f(x) crosses the line y = 5 an infinite num- 
ber of times in the vicinity of x = a. 

These cases clearly exhaust all the possible ways in which the 
continuously varying quantity y may tend to b as x >a + 0. We 
shall not consider separately the cases in which x >a — 0 (.e., 
where x approaches a from the left). We obtain the corresponding 
graphs from Figures I-5 by reflecting them about the line x = a. 
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If, as we assumed, y—> b as x > a, we have ye as well as 
Pat 


y > b. Moreover, each of the five above-mentioned types of be- 


“p--a—0 
havior of y to the right of a can occur together with each of the 
five analogous types to the left of a in the approach of x. Thus, we 
obtain, in all, 25 different ways in which y may behave as x ap- 
proaches «a. 

Let us also observe that sometimes it is convenient to express 
the cases represented in Figures | and 2 in the form 

yob+0 


“zat0 
and those in Figures 3 and 4 in the form 


ee 
Up to now, we have assumed that a and b are numbers. You 
can enumerate for yourselves without difficulty all the possible 
types of behavior of y = f(x) where one or both of these letters de- 
note +o or —oo. For example, when a = +o and 5 is a number, 
all possible cases are represented in Figures 1’—S’. 











Pig. 3° 
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Fig. 5’ 


The function f(x) is monotonic in Figures 1’ and 3’, but not 
monotonic in the remaining figures. In Figure 5’, if f(x) is contin- 
uous then it becomes equal to its limit 5 an infinite number of 
times for arbitrarily large values of x. 

When a is a number and 
b = +, there are essentially 
only two cases as x >a +0. 
These are shown in Figures 6-7. 


O a x 
Fig. 6 





8. THE LIMIT OF A CONSTANT FUNCTION 


As you undoubtedly know, we always consider the limit of a 
constant (a function which assumes only one value) as existing and 
equal to its unique value. Such an agreement is sometimes a cause 
of confusion. The notion of limit arises, after all, in connection 
with a variable quantity. How then can we speak of the limit of a 
constant? 
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A little reflection, however, shows that nothing is amiss. In the 
first place, such an agreement follows logically from our definition 
of a limit. For, if y = b for every value of x, then given any neigh- 
borhood V of b we have y € V; from which, by the definition of a 
limit, we have 

1006 ema oP (2) 

Secondly, such an agreement is expedient, in fact, almost un- 
avoidable. Suppose for a moment that we agreed to consider a con- 
stant function as not having a limit and suppose for some function 
y = f(x) the relation (2) is valid; then, as we know, lim (—y) = —b. 


But y + (—y) = 0 for all values of x; the sum of the functions y 
and —y, each of which has a limit, turns out to be a constant and 
thus has no limit. Hence, in the formulation of the familiar theorem 
on the limit of the sum of two variables we would have to add the 
condition, providing this sum is not a constant. Similar artificial and 
awkward restrictions would also have to be added to almost every 
theorem on limits. To free ourselves from this, we attribute a limit 
to every constant function; and since, as we have just seen, we do 
not thereby violate the general definition of a limit, this convention 
is adopted in all expositions of mathematical analysis. 


9. INFINITELY SMALL AND INFINITELY LARGE QUANTITIES 


The quantity y = f(x) is called infinitely small, or infinitesimal. as 
x — a, if as x > a we have y > 0. In many expositions of analysis 
the notion of the infinitesimal plays a fundamental role in the 
theory of limits. It is defined prior to the general concept of 
a limit, and in fact, the limit is often defined in terms of an 
infinitesimal. In contrast to this, a number of modern scholars, who 
consider the expression infinitesimal quantity as superfluous and 
liable to cause (indeed as actually causing) confusion and misun- 
derstanding, advise us to dispense entirely with this term. 

In this connection we should note that what is being referred to 
as superfluous is not the concept of a quantity tending to zero (this 
concept plays an essential role in any development of analysis and 
in all its applications) but only the rerm “infinitesimal quantity.” 
This expression is indeed completely inappropriate and frequently 
leads to misconceptions. The term seems to be trying to describe the 
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magnitude of the quantity to which it is applied, and it is mislead- 
ing to designate as infinitely smalla quantity which in certain stages 
of its variation may not be small at all. In reality, of course, the 
term is intended to describe only the character of variation of the 
quantity and not its magnitude. This apparent misnomer stems 
from an era when one ascribed a completely different meaning to 
this concept. 

As we have observed, however, quantities tending to zero are 
so frequently encountered in analysis and its applications that it 
would be very difficult to do without a special short name for 
them, while on the other hand, the replacement of a long-estab- 
lished name by some other term would be one of those terminolog- 
ical disasters whose disagreeable effects linger on long afterwards. 
Therefore, we shall retain the term on the condition that we do not 
take the notion of an infinitely small quantity as the basis of the 
theory of limits, but treat it only as a special case of a quantity 
tending to a limit. The danger of misunderstanding and confusion 
of concepts will then be greatly minimized. 

A quantity + = f(x) is said to be infinitely large (as x — a) if 

lim |vy| = +00. 

r—a 
Thus, an infinitely large quantity tends either to + oo or to — 00, or 
does not tend to any limit, taking alternately positive and negative 
values which increase indefinitely in absolute value. An example of 
such behavior is given by the function y = n(—1)" asin > +00 
through integral values. As far as the term infinitely large is con- 
cerned, we can repeat all that has been said about infinitely small. 

We have no reason to dwell here on the familiar theorems on 
the sum, difference, and product of infinitely small quantities. Let 
us recall only two matters which receive too little attention and 
therefore frequently lead to misunderstanding: 

(1) In the theorems on the sum and product of quantities 
involving infinitesimals, the condition that the number of 
terms (or factors) be finite is really essential; without this 
restriction both theorems are simply false, as it 1s easy to 
show by simple examples. 

(2) We cannot formulate any general theorem about the ratio 
of two infinitely small quantities. Such a ratio can vary in 
an arbitrary manner and, in particular, may not tend to any 
limit at all. 
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Before we proceed further we need to make a few remarks con- 
cerning the symbols + oo and — oo, which we have already used 
quite frequently. You know, of course, that these symbols do not 
denote a number. The best and clearest answer to the question 
of the meaning of the symbol +00 is that this symbol in itself 
has no meaning. Only the expression a neighborhood of + « has a 
meaning, and by this expression we understand the set of all real 
numbers greater than some (arbitrary) number a. It is convenient 
to have a short name for such a set, as in analysis we meet with it 
at almost every step. 

Simultaneously, with the definition of a neighborhood of +, 
such expressions as lim y = +00 (where a may be either +00, 

@r-—a 


—oo, or some real number) also acquire a meaning. Indeed, in our 
definition of the expression lim y = 5 the letters a and b entered 


only in connection with their neighborhoods. Therefore, each of 
these letters can reasonably be replaced by the symbol +0 (or 
— oo) after we have established what sets are to be neighborhoods 
of that symbol, and there is no need to ascribe any meaning to the 
symbol itself. 

It is clear from all this how much caution is necessary in using 
the symbols + oo and — oo. In particular, it is completely inadmis- 
sible to perform any arithmetic operations with them (4 = 0, etc.) 
as is done in some simplified courses in analysis. Similarly, all kinds 
of equalities in which the symbol +c or —oo appears other than 


explicitly in the role of a limit, as for example tan o = +o, have 


no meaning.! On the other hand, we do assign a definite meaning 
to inequalities of the forma < +0,b > —o, and -w cce< 
+oc. They mean, respectively, that a is either a number or the 
symbol — co, that b is either a number or the symbol + 00, and that 
c is a number. Finally, we must decide, when speaking of the exist- 
ence of the limit of a function f(x), whether this limit must be a 
number or whether we also accept as a limit one of the symbols 
+oo and —o. It is clear, of course, that in this case we can 
decide either way and that, consequently, the choice should be 
made on the basis of expediency. It is usually agreed that, unless 
explicitly stated otherwise, the sentence f(x) has a limit is to mean 
that this limit is a number. We shall hereafter observe this rule. 

1A correct version would be: lim tanx = +s2and lim tanx = —c. 

35/50 xaz+0 
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10. CAUCHY’S CONDITION FOR THE LIMIT OF A FUNCTION 


One of the most important problems in the theory of limits is 
obviously the following: Given a function y = f(x), determine 
whether it has a limit as x > a. Note that in view of the agreement 
just made, we are concerned with the existence of a number b such 
that y— b as x —~ a. Here a can signify either a number or one of 
the symbols + oo and — oo, and we are considering only the exist- 
ence of the limit b; the actual value of this limit is of no concern to 
us at this moment. The following important criterion for the exist- 
ence of a limit turns out to be especially useful in theoretical 
investigations. 


CAUCHY’S CONDITION.! A necessary and sufficient condition for a 
function y = f(x) to approach a limit as x — a is the following: For 
any € > 0 there exists a neighborhood U of the number (or of the 
symbol) a such that for any two numbers x, and Xz in U we have 


If) — f@2)| <e 


Proof. (i) Suppose that y tends to the number b as x — a. By 
the definition of a limit there exists a neighborhood U of a such 
that forx € U 


€ € 
BSS COR eG 


Therefore, if x; € U and x2 € U, both f(x1) and f(x2) are included 
between b — x and b+ = and, consequently, differ from each 
other by less than e. This proves the necessity of Cauchy’s condition. 

(ii) Let us assume now that Cauchy’s condition is satisfied. For 
every positive integer n there exists a neighborhood U, of a such 
that for x; € U, and x2 € U, we have 


ifs) — fle) <<. 3) 


Every such neighborhood U, is either an open interval or a half-line; 
moreover, we have the right to assume that Unyi C U, 
(n = 1,2,...). If the interval (half-line) U,41 were not part of the 
interval (half-line) U,,then we could simply replace the interval 
(half-line) Un, by the intersection U;,, of U, and U,,1. (This 


1Cauchy’s condition is also known as Cauchy’s criterion, Cauchy’s test, Cauchy’s con- 
vergence principle, and others. 
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intersection is, of course, not empty.) It is evident that U;,,1 is an 
interval (half-line). Moreover, U;,, C U, and for any x; C Unyi 
and x» C Un,, we have 


fen) - fea)| <5 


It follows that the neighborhood U;,, can replace the neighbor- 
hood U,,1 in all respects. 

Since f(x) satisfies the condition (3) in Un, the whole set M, of 
its values in this neighborhood is contained within some closed in- 


terval A, of length 2. But Uns: C Un; consequently, Mny1 C Mn 


and, hence, A,,; C A,. And since the length 2 of the interval 
n 


A, tends to zero as n-> o9, the intervals A, form a sequence 
of nested intervals. By Lemma 2 of Lecture | (page 19) we can 
assert that there exists a unique number b which belongs to all the 
intervals Aj. 

Finally, we shall show that lim y = b. For this purpose we de- 


ra 
note by V an arbitrary neighborhood of b. From the definition of 
this number it follows that A, C V for all sufficiently large n; there- 
fore, if x € U,, then f(x) € M, C A, C V, and so lim y = Bb. This 


completes the proof that Cauchy’s condition is sufficient. 
In particular, for a sequence of numbers 


Q1,a42,...,4n,... 


to have a limit it is necessary and sufficient that for any e > 0 we 
have 


[Qn = am | <— E, 


for all sufficiently large m and n. Cauchy’s condition is seldom 
used directly to prove the existence of a limit for a specific func- 
tion. For this purpose we usually apply simpler criteria; however, 
these criteria are not characteristic, that is, necessary and sufficient 
at the same time. On the other hand, in general theoretical investi- 
gations it is just this property that makes Cauchy’s condition an 
almost indispensable tool, as we shall see later on. 
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11. A REMARK ON THE FUNDAMENTAL THEOREMS ON LIMITS 


You are, of course, well acquainted with the fundamental 
theorems of the theory of limits, such as the theorems on the limit 
of a sum, difference, and product; there is no need to prove them 
here or even to state them. However, we shall have to make a 
remark in connection with this group of theorems. Since this 
remark will apply in the same degree to all theorems of this group, 
it will be enough to illustrate it with reference to any one of them. 

When we say the limit of the sum of two quantities equals 
the sum of their limits (and this is the way the theorem is stated in 
most simplified courses) we implicitly assume that all three quanti- 
ties, the two terms and the sum, tend to limits and we are 
concerned only with the interrelation of these limits. However. we 
assume more than is necessary: it is enough to assume the exist- 
ence of a limit for each of the terms, and it will follow that 
the sum must also have a limit (equal to the sum of the limits of 
the terms). This requires no additional reasoning, as it follows from 
any proof of the theorem on the limit of a sum. 

On the other hand, from the fact that the sum has a limit 
it does not necessarily follow that each of the terms has a limit. 
For example, suppose that y = f(x) does not tend to any limit as 
xX — a: it is evident that the same will be true for 1 — y, while the 
sum of these two quantities (being a constant) has the limit | as 
xa. 

Thus the theorem on the limit of a sum should read: 


THEOREM. If each of a finite set of quantities tends to a limit as 
x — a, then the sum of these quantities also tends to a limit and this 
limit is equal to the sum of the limits of the individual terms. 


Analogous formulations may be made of all the other theorems 
of this group. 


12. PARTIAL LIMITS; THE UPPER AND LOWER LIMITS 


We shall now investigate in detail the behavior of a function of 
x in the neighborhood of x = a when it fails to have a limit as 
x > a. The so-called simplified courses usually leave this question 
untouched. 
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We shall call a number b a partial limit! of y = f(x) asx >a 
if for any neighborhood U of a and any neighborhood V of b there 
exists x € U such that x is different from a and f(x) € V. This 
simple definition can also be applied when b is one of the symbols 
+oo and — oo. Less formally, a partial limit b as x > a is a num- 
ber } such that arbitrarily close to a there are values of x for which 
y differs from 6 by as little as we please. (Of course, when a or b is 
+oo, instead of speaking of numbers as close to a (or b) as we 
please, we speak of numbers as /arge as we please; and, analogously, 
when a or b is — oo.) 

We define analogously the partial limit of a sequence 
1, 2,...,da,...aASN— O. 

If lim y exists, then by a direct application of the definition we 


ta 
see that it is also a partial limit of y (and, in fact, the only one). 
This is true regardless of whether this limit is a number or one of 
the symbols + oo and — oo. But if there is no limit, then y has at 
least two partial limits, and a necessary and sufficient condition for 
the existence of one and only one partial limit is the existence of 
lim y. To establish these assertions we shall prove the following: 


PROPOSITION |. The function y = f(x) has at least one partial 
limit as x > a. 


PROPOSITION 2. If b is the only partial limit of the function 
y = f(x) as x >a, then limy = b. 
r-a 
In all cases the limits and partial limits can be either numbers 
or the symbols + oo and — oo. 


Proof of Proposition 1. If one of the symbols + 0c or — co is a 
partial limit of f(x) as xa, then the theorem is established. 
Therefore, we may assume that this is not the case and we shall 
then show that there exists a number b which is a partial limit of 
f@) as x a. 

Since neither + 00 nor — oo is a partial limit of f(x) as x > a, 
there exists a pair of numbers a and B (a < f) such that for all x 
sufficiently close to a, 


as f(x) < B. 


1Since the existence of a partial limit b of a sequence implies the existence of a sub- 
sequence having b as its limit, such partial limits are often referred to as subsequential 
limits. 
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Thus, in any neighborhood of a there exists an x such that f(x) 
belongs to the interval [a, 8], denoted by Aj. Let us agree to denote 
by (A) this property of A,. If we divide this interval into halves, 
then at least one of these halves must have property (A). For 
if some neighborhood U, of a contains no x for which f(x) belongs 
to the right half of Ay, and some neighborhood U, contains no 
x such that f(x) belongs to the left half of this interval, then 
the neighborhood U consisting of the intersection of U, and 
Us. cannot, of course, contain any x for which f(x) € Aj. In other 
words the interval 4; does not possess property (A). Therefore, we 
can choose a half of the interval A; which has property (A), denote it 
by As. and deal with it in the same way we have dealt with Aj; that 
is. we shall divide it into halves and denote by A3 the one which 
has property (A). 

Continuing this process indefinitely, we clearly obtain a sequence 
of nested intervals A. Ay, Az... .. rs ee Let b be the unique 
number common to all these intervals and let U and V denote 
arbitrary neighborhoods of a and b respectively. By the definition 
of b there exists an interval A, C V; but every interval A, has 
property (A) and, consequently, there exists an x € U such that 
f(x) € A, C V. But this means that b is a partial limit of y as x a. 


Proof of Proposition 2. By virtue of Proposition 1, the quantity 
y has at least one partial limit b as x — a; if the relation lim y = b 
ra 


is not true, then there exists a neighborhood V of the number (or 
symbol) 6 having the property that in any neighborhood U of a 
there exists an x such that f(x) is outside of V. Hence, there must 
clearly occur one of the following two situations: either one of the 
symbols +c and —ce (other than 5) is a partial limit of y as 
x — a, or there exists an interval [a, 8] situated outside V such that 
any neighborhood U of a contains an x for which f(x) € [a, A]. In 
the first case, Proposition 2 is established. In the second case, we 
can show by precisely the same method as in the proof of Proposi- 
tion | that, as x — a, the quantity y has a partial limit 5’ contained 
in [«, 8] and, consequently, different from 6. Thus Proposition 2 is 
proved in this case also. 


We have established, then, that every quantity which does not 
have a limit as x — a (where a may be + ~ or — 00) must have at 
least two partial limits. In general, we can assert nothing more. 
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Thus, the quantity y = (—1)" (where n is a positive integer) 
obviously has exactly two partial limits, +1] and —1l, asn— oo. 
On the other hand, there are cases where there exist an infinite num- 


ber of partial limits. For example, the function y = sin - asx— 0 


has for partial limits all numbers in the interval [—1, +1]. For as 


x — 0, the quantity sin a changes continuously an infinite number 
x 


of times from — 1 to +1 and back again; hence, for any number 6 
of the interval [—1, +1], we can find an arbitrarily small number 
a (that is, a belonging to an arbitrarily small neighborhood of 


zero) for which sin lies B. 
a 


In spite of the fact that the sets of partial limits for different 
functions may differ so widely, they nevertheless have certain fea- 
tures in common which are of essential importance in analysis. We 
shall now consider some of them. 


Property I. [fa number (or symbol) b is not a partial limit of 
y=f(x) as x—a, then there exists a neighborhood V of b which 
does not contain any such partial limit. 


Proof. If b is not a partial limit of y as x > a, then there exists 
a neighborhood V of b and a neighborhood U of a such that y can- 
not belong to V for x € U. But since the open interval V is a 
neighborhood of each of its points, it follows that no point of V 
can be a partial limit of » as x — a, which proves Property I. 


Before going on to Property II, let us observe the following: If 
among the partial limits of y = f(x) as xa there occurs the sym- 
bol +00, we naturally regard this partial limit as the greatest; 
similarly, if the symbol — oo is a partial limit, we consider it to be 
the least. Having agreed on this we can now formulate Property II. 


PROPERTY II. Among the partial limits of y = f(x) as x = a there 
always exists a greatest and a least. 


You understand, of course, that this statement is far from ob- 
vious; there is, for example, neither a greatest nor a least number 
in the open interval (0, 1) and, in general. not every set of numbers 
has this property. Thus (as is already evident from Property I), nox 
every set can qualify as a set of partial limits: it must have certain 
specific traits, and in particular must contain a greatest and a least 
number (or symbol). 
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Proof. First of all, we may restrict ourselves to the greatest 
partial limit, as the proof for the least partial limit is completely 
analogous. Further, we may assume that the symbol + is nota 
partial limit (otherwise it would be the greatest partial limit and 
Property II would be established). By virtue of Property I. it fol- 
lows that there are also no partial limits in a neighborhood of the 
symbol + 2: in other words. all partial limits are smaller than a 
certain number. If the symbol —>« is the only partial limit, then it 
is also the greatest partial limit and Property II is again established. 
We may therefore assume that among the partial limits there exists 
some number b. 

Let us now divide the set of all real numbers into two classes A 
and B according to the following rule: if to the right of x there exists 
at least one partial limit. then x € A: otherwise x € B. It is easy to 
see that this division is a cut. Let a be the edge of this cut; we 
shall show that a is the greatest partial limit. First. a is a partial 
limit. as otherwise. by virtue of Property I, some neighborhood 
(a1. a2) of a (a, << @ < a2) would contain no partial limit. But 
from the inequality ay < a. we have a; € A, so that there would be 
a partial limit to the right of a1: since it is not in the interval 
[ay. a]. it must lie to the right of a, which is impossible since the 
inequality a2 >a implies that ag € B. Second, there can be no 
partial limits to the right of a. For if 8 > a were a partial limit, 
then for any y situated between a and B, a<y< f, we would 
have y € B from y >a and, simultaneously, y € A from f > y. 
Thus a is the greatest partial limit and Property II is established. 

The greatest and least partial limits. whose existence is asserted 
in Property II, are of great importance: we call them the upper and 
lower limits of » = f(x) as x > a and denote them by 


fim f(x) and lim f(x). 
or by pt sia 
lim sup f(x) and lim inf f(x). 
Obviously, - e 


lim sup f(x) > lim inf f(x) 


in all cases. These two numbers can assume any value, and each of 
them may turn out to be + or —%x. We say that f(x) is bounded 
as x > a if its upper and lower limits as x — a are numbers, and 
unbounded if at least one of them is the symbol + x or —~x. Evi- 
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dently the boundedness of f(x) as x->a means the existence 
of numbers a and B (a < f) and a neighborhood U of a such that 
for any x € U we have 


a< fx) <B. 
A necessary and sufficient condition for the existence of lim y 
ra 
(in the sense of a number or a symbol) is the relation 
lim y = lim y, 
ra ra 
since coincidence of the upper and lower limits is equivalent to the 
existence of only one partial limit. _—_ 
Numbers situated between lim y and lim y may or may not be 
ra 


ra 
partial limits as x — a; we have seen above examples of two extreme 
cases where (1) none of these numbers was a partial limit and 
where (2) each of them was a partial limit. In the general case 
some of them will be partial limits, others not. 

Sometimes upper and lower limits are defined differently, and it 
is useful to know these different definitions in order to broaden and 
make more concrete our understanding of these concepts, as well 
as to facilitate their application. 


DEFINITION. A number b is called the upper limit of y = f(x) as 
x— a if for every interval [a, B] containing b in its interior there 
exists an arbitrarily small neighborhood U of a such that f(x) < B for 
allx € Uanda < f(x) < B for at least one number xo (xo # @) 
contained in U. 


To prove that the new definition is equivalent to the previous 
ones, we observe first of all that if a number 5 is the upper limit 
according to the new definition, then dis a partial limit. If there were 
a partial limit 5’ > 5, we could find numbers a, £, and a’ such that 


abba’. 


Then, since b’ is a partial limit, any neighborhood U of a will con- 
tain a number x for which a’ < y, whence y > B. But this obviously 
contradicts the fact that b is the upper limit according to the new 
definition. 

Conversely, suppose now that b is the greatest partial limit of 
y=f(x) as x—a according to the previous definition; if 
a <b < B, then any neighborhood U of a contains a number x for 
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which a<y <<. To prove that 6 is the upper limit of y in the 
sense of the new definition we therefore have only to show that 
with a proper choice of U we shall have y < B for every x € U. 

Since the number 5 is the greatest partial limit, it follows that 
+ oo cannot be a partial limit. Therefore, there exists a number Yo 
and a neighborhood Up of a such that y¥ <vo for all x € Uo. If 
yo < &, there is nothing further to prove. If yo > £, then every num- 
ber A in the closed interval [8, yo], not being a partial limit, has a 
neighborhood V, to which there corresponds a neighborhood U, of 
a such that for any x € U,, y = f(x) is outside V,. The family {V,} 
covers the interval [8, vo]. Applying the Heine-Borel lemma 
(Lemma 3 of Lecture 1), we can find a finite subfamily 
Vias Viz... Vy, covering [, y]. Let U denote the intersection of 
the neighborhoods Uo, U,,, Uy,..... U,,. it is clear that U is a 
neighborhood of a and that if x € U, the corresponding value of y 
cannot belong to any of the intervals V,,,V,,,..., V,,. In other 
words, y < B for every x € U, which proves our assertion. 

It goes without saying that the lower limit can also be redefined 
in the same way and that the equivalence of the old and new defi- 
nitions can be proved in a completely analogous manner. Similar 
definitions can be formulated for the case when b is +00 or —0o0. 

If we wish to view this whole abstract scheme in a more con- 
crete way, a picture presents itself which we may attempt to 
describe without any pretense to formal precision. A variable 
quantity under some process of change approaches now one num- 
ber, now another, now a third, and so on. We call the number ba 
partial limit of the given quantity if this quantity will after any 
point of time, however late, still approach arbitrarily close to the 
number b. It is quite possible that in the interval between two 
such approaches it departs extremely far from b: what is important 
is only that sooner or later it again approaches arbitrarily close 
to b. If b is the only such point of gravitation, then our quantity y 
not only becomes but finally stays arbitrarily close to 6 and 
we have lim y = b. 


ra 
Generally speaking, y does not tend to a limit, but rather, like 
an undamped pendulum it continues to oscillate within certain 
limits, no matter how long we watch the process of its variation. 
These limits (among which may be + and — oo) are precisely 
what we have called the upper and lower limits of the given 
quantity. 
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13. LIMITS OF FUNCTIONS OF SEVERAL VARIABLES 


There yet remain to be considered a few uncomplicated prob- 
lems in the theory of limits connected with functions of several 
variables. To simplify our exposition we shall speak only of func- 
tions of two independent variables; all that is said in this section 
can also be applied, with appropriate and self-evident changes, to 
functions of any number of variables. 

Every pair of values of the independent variables x and y is 
represented by a point in the coordinate plane. In what follows, for 
brevity, we shall mean by the point (a, 6) a pair of values for these 
variables, x = a and y = b. By a neighborhood of the point (a, 5) 
we shall understand any open set! in the coordinate plane which 
contains this point in its interior; this set may be the interior of a 
circle, the interior of a rectangle, or may have a more complicated 
form. The term a neighborhood of the point (a, b) can also be given 
a meaning when a or b (or both) denotes + 0 or — oc. For exam- 
ple, if a is a number and b is +o, then we shall call every domain 
of the forma << x < B, y > y, where a << a < B and the number y 
is arbitrary (Fig. 8), a neighborhood of the point (a, 6). For 
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Fig. 8 
a= —« and b= +o, 4 neighborhood of (a, b) is any domain of 


the form x <a, y > B, where a and # are arbitrary numbers 
(Fig. 9). 


‘An open set in the plane is a set with the following property: if a point belongs to 
this set, then there exists a rectangle with its center at this point which is wholly con- 
tained in this set. 
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We say that the number (or symbol) c is the limit of the func- 
tion z = f(x, y) as x > a, y > b, if for every neighborhood V of ¢, 
there exists a neighborhood U of (a,b) such that at every point 
(x, y) contained in U and different from (a, b) we have z € V. We 
write this as follows: 


lim 2 Se) OF -2=S-¢ (4) 
r-a ra. 
yd yb 


With this definition the whole theory of limits for functions of 
one independent variable can easily be extended to the two-dimen- 
sional case: in particular, the concepts of upper and lower limit re- 
tain their definitions and their properties. Cauchy’s condition like- 
wise remains valid. 

As for proof of the latter, it is entirely analogous to the 
one-dimensional case; the only difference is that the lemmas used 
must now be applied in their two-dimensional forms. We need, for 
example, the lemma on nested two-dimensional intervals and the 
two-dimensional form of the Heine-Borel lemma. We can prove 
these in the same way as for the corresponding one-dimensional 
case, and there is no need to dwell on them here. 

It is necessary to differentiate strictly between the double limit 
(4) and the iterated limits 


lim (lim z) and lim (lim z). (5) 
yb 23a Ta yob 


Here, instead of one two-dimensional passage to the limit, we have 
two successive one-dimensional passages to a limit. The geometric 
picture connected with neighborhoods of the point (a, 5) in the 
coordinate plane now disappears completely. Thus to obtain 


lim (lim z) we first have to assign to the variable y any con- 
y- roa 


stant value and, having transformed in this way the quantity z into 
a function of one variable x, look for its limit as x — a. This limit 
can exist for some values of y and not exist for others. If it exists 
for all values of y situated in a neighborhood V of 6 (with the pos- 
sible exception of y = 5), then in this neighborhood lim f(x, ») isa 
function of one variable y, and, in turn, we can inquire about its 


limit as y — b. If this limit exists, we denote it by lim (lim 2). 
y> a 


ro 


4| 


It may happen that both iterated limits (5) exist while the 
double limit (4) does not exist. Consider, for example, the behavior 
of the function 


epee 
~~ x2 + y 


in a neighborhood of the origin. (The fact that the function is not 
defined at the origin makes no difference, as in the definition of the 
double limit as well as of the iterated limits the quantity f(a, 6) 
plays no part; if desired, we may assign /(0, 0) an arbitrary value.) 
For x constant and unequal to zero we have lim z = 0, and fory 


constant and unequal to zero we have lim z = 0; hence 


I- 


mee Gyan) = yap Gg 7) = 
On the other hand, an arbitrary neighborhood of the point (0, 0) 
contains points for which x = 0 and y #0, as well as points for 
which x = y 40. Since at points of the first kind z = 0 and at 
points of the second kind z = 1, both the numbers 0 and | are 
partial limits of the function z as x — 0, y— 0; and, consequently, 
lim z does not exist. 
y—0 

We can also have the opposite case where the double limit (4) 
exists while neither of the iterated limits exists. Let us consider, for 
example, the function 


(x2 + y2) sin = if xv ~ 0, 


Ny 
MI 


0 if xy = 0, 


in the vicinity of the origin. Since |z| < e? for all points in the disc 

x2 + py? < e?, it follows that lim z = 0. On the other hand, for a 
y=0 

constant x £0 and for »—>0 (as also for a constant y 0 and 


x— 0)sin ay and hence the function z, do not tend to a limit. Thus 
x 

the limits lim z(y 4 0) and lim z(x #0) do not exist, and we can- 
r-» yo 

not even begin to speak of the existence of the iterated limits (5). 
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However, it is possible to show that if all three limits (4) and 
(5) exist, they must be equal. For, suppose that 
lim (li = 
cae amas O 
Let V be any neighborhood of c, and U any neighborhood of 
(a, b). Then by virtue of (6), there exists a neighborhood A of 
a such that for all x € A (x a), we have 


limzcV; 
y->b 
this neighborhood 4 is an interval (or half-line) on the x-axis. Ob- 


viously, we can select in this interval a number Xo situated so close 
to a that the point (xo, 5) is contained in U (Fig. 10). Fixing this 





Fig. 10 


But by the definition of a limit, this means that for an arbitrary 
neighborhood V’ of cz, it is possible to find a neighborhood 
B,, of b such that z € V’ for x = xo and any py € B,,. But nothing 
prevents us from selecting the neighborhood V’ so small that 
V’ C V (since cz, € V); and, on the other hand, since the point 
(xo, b) € U, we can choose a number yo € B,, so close to b that the 
point (Xo, Yo) will also be in U (Fig. 10). 

Since Yo € Bz,, we have z € V’ C Vat the point (xo, yo). Let us 
recall now that U is any neighborhood of the point (a, b) and that 
V is any neighborhood of c. Thus, the fact that we can always find 
a point (xo, Yo) € U at which z € V means that c is a partial limit 
of z as x >a, y— b. If, as we assumed, the limit lim z exists, then 

yodb 
it is the only partial limit and, therefore, it has to be equal to c. 
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3. Functions 


14. WHAT IS A FUNCTION? 


The definition of functional dependence, which we discussed to a 
certain extent at the very beginning of Lecture 1, originated and 
finally won acceptance amid strong opposition. Even today the 
echoes of this conflict have not yet completely died down, although 
it seems that the scientific value of this definition is no longer 
disputed.! 

The basic aim of this struggle was to overcome the predomi- 
nance of the analytical apparatus which, since the eighteenth cen- 
tury, has weighed heavily upon the idea of functional dependence. 
This predominance, which transformed the analytic expression from 
a convenient instrument into a despotic ruler over the idea of a 
function, has been more or less completely eliminated within 
mathematics itself. But in the applied sciences and in the schools 
(even in higher technical education) it still prevails to a consider- 
able degree. Almost every engineer, while accepting the formal 
scientific definition of functional dependence (in which not a 
word is said about an analytic expression), still visualizes a func- 
tion primarily as a formula, as an analytic expression, and, in gen- 
eral, cannot think about functional dependence except in these 
terms. Thus he differs in an essential way from the mathematician, 
who is accustomed to taking the definitions of his concepts seriously 
and, therefore, at the word function thinks always of a correspond- 
ence between two sets without associating it with any analytical 
apparatus. In view of this difference, we shall have to dwell at 
some length on the idea of functional dependence and other related 
notions. 

It will perhaps be best, instead of entering into a detailed and 
systematic analysis (for which we have not enough time anyway), 
simply to touch on a few of the more typical sharp conflicts which 
arise in this connection between the notion of functional depend- 
ence entertained by a mathematician and that entertained by a man 


1In the United States of America a number of writers prefer a set-theoretic definition 
of function. Various points of view are discussed further in this lecture. 
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educated in the older traditions. University professors experience 
conflicts of this type every vear with their freshmen who bring 
from their secondary schools educational habits and traditions of 
past centuries. 

Let us consider the analytic expression of the function repre- 
sented graphically in Figure 11. One glance at this simple graph 












O 
Fig. 11 
will convince us that the idea of a real quantity » which varies in 
accordance with the law indicated by the graph does not contain 
in itself anything unacceptable. but an engineer or physicist will 
feel even more confident of this statement than a mathematician. 
When set to the task of representing the given function analyt- 
ically. a mathematician immediately writes down the only solution 
acceptable from his point of view: 


neta iO: 
a Tmt UbLee Se: ae eee Os 


X 


(1) 


and right here the conflict begins. An engineer (and let us. for 
brevity. call by this name a representative of the obsolete traditions— 
our engineer friends in the audience please forgive us) immediately 
objects, declaring to the mathematician that he has “written not 
one but two functions.” To this the mathematician replies: 

(a) It is impossible to write a function; one can only write an 
analytic expression. 

(b) He. the mathematician. actually wrote two analytic expres- 
sions, indicating at the same time precisely for what values 
of the independent variable x one should applv the one or 
the other of these expressions in order to compute the cor- 
responding values of the function 1. 

(c) In accordance with the definition adopted for functional 
dependence, the two analytic expressions in (1) determine 
exactly one function, since they assign to each value of x 
only one value of ». 


aay 
tay 


(d) The expression (1) exactly represents the dependence given 
graphically in Figure 11, and, therefore, completely solves 
the stated problem. 

The mathematician might also add (but does not do so for 

pedagogical reasons) that in case of insistent demand he could also 
represent the function shown in Figure 11 by a single formula 


y=l1- |x| (2) 


valid for all x. He simply considers the expression (1) to be more 
convenient than the expression (2) and, at the same time, equally 
legitimate. 

Recalling the definition of functional dependence, the mathe- 
matician may go even further and insist, for example, that the 
expression 


x ifx <0, 
Y= 5 ifx = 0, 
l+x ifx>0, 


legitimately represents one single function (Fig. 12). We do not 





Fig. 12 

know what objections the engineer might raise to this, but on the 
basis of personal experience we are inclined to make the following 
prognosis: even if he does not express any objections, he will still re- 
main unsatisfied. Habits acquired over the years cannot be eradi- 
cated by a single short discussion. 

On another occasion the mathematician might define the so- 
called Dirichlet function: 


= | | when x is a rational number, 
Pee Q when x is an irrational number. 
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The perplexed engineer will ask, “But what sort of function is this? 
It cannot be written by means of a formula, nor represented by a 
graph.” To this the mathematician will reply: “In the definition of 
functional dependence nothing is said either of an analytical ex- 
pression or of the geometric representation of the function, and, 
therefore, the question of whether this is a genuine function in no 
way depends on whether the function can be given analytically or 
represented geometrically. The definition of the Dirichlet function 
just formulated attributes to every value of the quantity x a unique 
value of the quantity y and, therefore, is irreproachable. Moreover, 
although the geometric representation of the Dirichlet function is 
difficult, its expression by means of a formula is very simple; it 
is enough to denote it by f(x).” 

The engineer. who was listening quietly at first, feels sincere in- 
dignation at this last remark, which results in the following dialogue: 


E. But is this a formula? 

M. What do you call a formula? 

E. Well, some analytic expression such as y = x? — x8 ory = 
sin x; but what sort of expression is this y = f(x)? 

M. Very well. You mean that the notation y = sinx for the 
familiar function called sine is,in your view, an analytic ex- 
pression, while the notation y = f(x) for the Dirichlet func- 
tion is not an analytic expression? What, then, is the 
essential difference between the symbols “sin” and “f(__)’’? 

E. But every literate person knows what the formula y = sin x 
means, while the notation y = f(x) for what you call the 
Dirichlet function has just been devised by you and is 
unknown to anyone else. 

M. Now it seems that we understand each other. The distinc- 
tion which you indicate (and which I do not deny) is, as 
you yourself realize, not fundamental but historical. For 
there must have been a moment when someone for the first 
time proposed to denote by sin x a function for which no 
generally accepted notation had existed until then. That 
moment was for the function y = sinx the same as this 
moment is for the Dirichlet function y = f(x). Would you 
say that the function y = sin x was not analytically express- 
ible until the suggested notation took root and was generally 
accepted? And should the notation y = f(x), which I have 
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just proposed for the Dirichlet function, become accepted 
by the whole scientific world in a few years, would you then 
say that it became a formula, an analytic expression, and 
that the Dirichlet function itself became analytically express- 
ible? It is clear, of course, that an interpretation of analytic 
expressibility that is based on scientific fashion has not the 
slightest mathematical value. But if we reject this interpreta- 
tion, you cannot but admit that the symbol “f(_ )” is funda- 
mentally as legitimate as the symbol “sin”; and therefore 
the Dirichlet function is analytically expressible to the same 
degree and in the same sense as the sine and cosine functions. 
Generally speaking, a discussion of the analytic expressibil- 
ity of a function must in all cases be recognized as 
pointless, since we can designate the function by any sym- 
bol and rightfully consider this designation as its analytic 
expression. Finally, I can tell you that we know how to ex- 
press the Dirichlet function by means of symbols familiar 
to you. But we almost never use this expression because it 1s 
complicated and does not offer us the possibility of learn- 
ing anything essential about the properties of this function, 
whereas the definition which we gave above shows these 
properties very clearly. In general, we do not like making 
fetishes out of analytic expressions; we gladly use them in 
those cases where they are helpful in studying a given func- 
tional dependence, and reject them without regret if the in- 
vestigation is simpler without them, which is true, for 
example, in the case we have just considered. 


At this point, we must terminate the discussion. since we fail to 
see what further objections could be raised by our engineer. On the 
other hand, our hypothetical mathematician has so clearly charac- 
terized the attitude of contemporary mathematics to the relation- 
ship between functions and analytic expressions that the conclusions 
will be evident without further comment.! 


1 Although the author goes to great length to rule out analytic expressibility as a mean- 
ingful qualifying condition for a function, the reader should be warned that in higher 
mathematics the term analytic alone, when applied to a function, has a definite and 
precise meaning. 
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15. THE DOMAIN OF A FUNCTION 


We must now concern ourselves with one more question closely 
connected with the foregoing. For a function y = f(x) to be con- 
sidered as given, it is by no means necessary to define it for all 
values of the variable x. It often happens that only a portion 
of these values is of any interest, so that to define the function f(x) 
outside this portion appears pointless. The reasons for such a 
restriction of the set of values of x can be most varied, purely 
logical as well as practical: we have already discussed this, in 
part, in Lecture |. If. for example. we define f(x) as the pe- 
rimeter of a regular polygon of x sides inscribed in a circle of 
radius 1. it is self-evident that the function is determined by this 
condition for all integral x > 3, and only for these values; we say 
that y = f(x) is in this case a function of x defined on the set of all 
integers x > 3. Formally. of course, nothing prevents us from com- 
pleting the function f(x) by assigning to it arbitrary values for the 
remaining values of x; but if there is no need for this we do not do 
it, leaving the function f(x) completely undefined outside this set. 
Another example: if in a given physical investigation x designates 
the temperature of an object expressed in degrees centigrade, it 
would be absurd to define a function f(x) for values less than — 273. 

It would, therefore, serve no purpose to require that every func- 
tion y = f(x) be defined for al/ values of x. To insist on this would 
force us in practice to extend most functions in an artificial and 
completely useless manner beyond their domains of original defi- 
nition. This forces us to introduce a necessary clarification into the 
definition of functional dependence. 

Let us agree that y = f(x) is a function on the set M if to each 
value x € M there corresponds a unique value of y. This definition, 
which is somewhat more precise than the one we gave at the be- 
ginning of Lecture 1, immediately resolves all doubts and allows us 
in studying any individual function to restrict ourselves to that set 
of values of x which is dictated by the aim of the given investiga- 
tion (this set being the natural domain of the function). 

The choice of the domain for a function can be dictated, as we 
already know, by purely mathematical as well as by other consider- 
ations. It is indispensable to caution here against one regrettable 
confusion which results from the persistence of the tradition of 


49 


identifying a function with an analytic expression. Frequently, 
people want to make the domain of definition of a function depend 
on the set in which this or that analytic expression has a meaning. 
They say, for example, that the domain of the function \/\ — x? Is 
the interval {—1, +1] or that the domain of the function log x is the 
half-line x > 0, when, in fact, they are talking not about the do- 
main of definition of a function but about the set in which a given 
analytic expression makes sense. For example, it can easily happen 
(and a number of these examples are known in mathematics) that 
we have to deal with a function y = f(x) defined on the interval 
[0, 2] and of real interest to us in this whole interval, and which for 
0 <x <1 can be expressed by the formula y = \/1 — x?. It does 
not follow from this, however, that the interval (1, 2] lies outside 
the domain of our function and is of no interest to us. On the con- 
trary, we shall look for another analytic expression of this function 
(based, of course, on its definition) for the interval (1, 2]; and if we 
fail to find one, we shall investigate the function in this interval by 
other, nonanalytic, methods. 

We choose the domain of a function on the basis of considera- 
tions either purely mathematical or dictated by its applications, but 
in every case these considerations are based on the essence of the 
matter and should never be tied to the purely formal characteristics 
of any particular analytical apparatus. 


16. CONTINUITY OF A FUNCTION 


In beginning the investigation of functional dependence, we 
must first of all introduce, with the help of an appropriate system 
of classification, a certain amount of order into the diversity of our 
subject matter. The first such classifying and organizing principle 
is usually (and justly) the separation of all functions into contin- 
uous and discontinuous functions. Actually, mathematical analysis 
deals almost exclusively with continuous functions, taking into con- 
sideration only in relatively rare instances a few of the simpler 
types of discontinuous functions. Continuous functions have a 
number of special properties which discontinuous functions in gen- 
eral do not have. As a consequence of these properties, the study 
and application of continuous functions is considerably simplified, 
and thus the investigation of these properties is extremely impor- 
tant for analysis. 
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We say that a function y = f(x) is continuous at x = a (or con- 
tinuous at the point a) if 


lim f(x) = f(a), 


or, equivalently, by the definition of a limit, if for an arbitrary 
neighborhood V of f(a) there exists a neighborhood U of a such that 
for any x € U we have f(x) € V. Thus, for the continuity of a func- 
tion f(x) at the point a it is necessary first that the limit 


lim f(x) 


exist, and second that this limit coincide with the value of the func- 
tion at x = a. It is clear that the second condition does not follow 
from the first, as the example of the function 
y_ { x? ifx 0, 
Oa ae ca 3) 
shows. 

With regard to this definition, we should note first of all that 
continuity understood in this way is a local property of the function, 
that is, a property which may hold at one point and not at another. 
For example, the function (3) is discontinuous at x = 0 and con- 
tinuous at every other value of x. This distinction between local 
and global (nonlocal) properties of a function is very important 
and should always be kept in mind. 

Further, we call a function continuous in a closed interval {a, b] 
if it is continuous, in the sense stated above, at every point of this 
interval. However, at the end point a we require continuity only 
from the right, that is, 


Jim, f@) =S@), 


and at the end point 5 continuity only from the left, defined 
analogously. In the case of an open interval (a, 6) nothing, of 
course, is required of the function at the points a and b. 

It may be noted, incidentally, that mathematicians have long 
used the very convenient notation 


fle + 0) = lim, f(@) 
f(c — 0) = lim f(s), 


and 
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by means of which it is possible to write the definition of conti- 
nuity of a function f(x) at the point c in the form of a very simple 
relation: 


fe-9)=fO =flc + 9). 


This notation cannot lead to any misunderstanding if we simply 
remember that f(c + 0) and f(c — 0) are not values of the function 
at any point, but limits of these values as x varies in a certain defi- 
nite manner. 


17. BOUNDED FUNCTIONS 


We now have to get acquainted with another property of func- 
tions which, in contrast to continuity, is not a /Jocal property but a 
global one; that is, it is defined for a set of values of the independ- 
ent variable without having previously been defined for individual 
values (at individual points). 

A function y = f(x) is said to be bounded on the set M if all the 
values it assumes on this set are contained in some finite interval. 
We can replace this definition by another requirement which is 
entirely equivalent: the existence of a positive number c such 
that |f(x)| <c¢ for all x € M. In addition, we say that f(x) is 
bounded above (or below) on the set M if there exists a number c 
such that 


fxy<e Ux) >0e) forallx € M. 


For a function to be bounded it is of course necessary that it be 
bounded above and below. 

The property of boundedness does not, like the property of 
continuity on a set, merely mean that certain requirements are 
satisfied at each individual point. If we only wanted to find for 
each individual point x in the domain of the function a number c 
such that | f(x)| < ec, then this could always be trivially accom- 
plished by taking c = | f(x)| + 1, and thus every function is bounded 
at each individual point in its domain. This does not mean, how- 
ever, that the function is bounded. For this to be so, it is necessary 
to find a number c which will at once serve as a bound at all points 
in the domain. To see how a function may be defined at every 
point of an interval without being bounded on this interval, let us 
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recall that tan x increases indefinitely as x > 2 — 0 and con- 


sequently the function 


tanx f0<x< 


is not bounded on the interval 0. 3| 


As in the case of many global properties, it is possible to find 
for the boundedness of a function in a given interval a local prop- 
erty whose existence at every point of the interval is equivalent to 
the existence of the global property. We shall call a function 
y = f(x) bounded at the point x if this function is bounded in 
a neighborhood U of x. (Note that the local property here is 
defined in terms of the previously defined global property; in the 
case of continuity the situation was exactly the reverse.) We can now 
assert that a necessary and sufficient condition for the boundedness of 
a function y = [(x) on the closed interval [a, b is that it be bounded 
at each point of that interval. The necessity of this condition follows 
immediately from the definition. To show its sufficiency, suppose 
that every point x of [a, b] is contained in a neighborhood U; 
in which y = f(x) is bounded. Applying Heine-Borel’s lemma, we 
find that the interval [a, b] can be covered by a finite number of in- 
tervals Az, As..... A, in each of which y is bounded. If |y| < ¢; in 


the interval A; (i = 1,2....,m) and if cis the largest of the num- 
bers ¢1, Co...., Cn, then |y| < ¢ for all x € [a, b], which proves our 
assertion. 


Let us agree to call a set of numbers N bounded if all the num- 
bers belonging to it are contained in some interval. It is obvious 
that the boundedness of y = f(x) on the set M is equivalent to the 
boundedness of the set N of all values assumed by this function 
on M. The meaning of the expressions the set N is bounded below 
(or on the left) and the set N is bounded above (or on the right) 
is self-evident. 

Let us agree to call the number f the /eas? upper bound (1.u.b.) 
of N if: 

(1) the set N does not contain numbers greater than £, and 


(2) in every neighborhood of f there is a number belonging to N. 
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Similarly, we call a number a the greatest lower bound (g.1.b.) of the 
set N if: 

(1) there are no numbers in JN less than a and 

(2) in every neighborhood of a there is a number belonging to N. 


It is evident that a set which has a least upper (greatest lower) 
bound is bounded above (below). In analysis, the converse theorem 
plays a significant role. 


THEOREM 1. Every nonempty set of numbers bounded above 
(below) has precisely one least upper (greatest lower) bound. In par- 
ticular, any nonempty bounded set has both a least upper bound and 
a greatest lower bound. 


Proof. Denoting our bounded set by N, let us divide the set of 
all real numbers into two classes A and B according to the follow- 
ing principle: x € A if to the right of x there exists at least one point 
of the set N, and x € B otherwise. One can readily see that this 
partition is a cut. Let a be its edge. We show that a is the l.u.b. of N. 

First, we shall show that there are no points of N to the right 
of a. For, if B€ N and B >a, then setting y = 4(a+ B) we 
have a<y< B. From y >a it follows that y € B, while from 
B>y and BEN it follows that y € A; we thus arrive at a 
contradiction. 

Further, let U = (a1, a2) be any neighborhood ofa, a1 <<a < az. 
It is obvious that a, € A and ag € B. By virtue of the first of these 
relations there exist points of N situated to the right of a1, while, 
as was just shown, there are no points of N to the right of a. Hence 
all points situated to the right of a, belong to the interval (a1, ae). 
Thus, the number has both of the properties of a l.u.b. and is 
established as such. 

This bound is unique. Indeed, if the set N had two least upper 
bounds £ and B’, B < B’, we would arrive at a contradiction im- 
mediately upon noting that, on the one hand, N cannot contain 
any numbers larger than B (as B is an upper bound) and, on the other 
hand, it would have to contain such numbers because an arbitrarily 
small neighborhood of £’ would have to contain them. The proof 
of the theorem for the case of the greatest lower bound is, of 
course, completely analogous. 


We can visualize these bounds of a set as the end points of the 
smallest interval containing all the numbers of this set. Evidently, 
we can also define the l.u.b. of a set N as the least number c such 
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that x <c for all x € N. An analogous definition can be given for 
the g.l.b. It is important to remember that each of these bounds of 
a set may or may not belong to this set. For example, a closed in- 
terval contains its bounds, but an open one does not; the set of 


1 dene Bast 
numbers —, where n is any positive integer, contains its |.u.b. (the 
n 


number 1) but does not contain its g.1.b. (the number 0). 

If y = f(x) is bounded on a set M then, as we have already 
mentioned, the set of values assumed by this function on M is 
bounded, and thus, by the theorem proved above, has least upper 
and greatest lower bounds. These are called the bounds of f(x) on 
M. Hence, every function bounded on a given set has exactly one 
Lu.b. and one g.l.b. on this set. Each of these bounds may turn out 
to be a value of the function; then it is the greatest or the least value 
assumed by this function on the given set. It may happen, how- 
ever, that one bound or the other is not among the values of the 
function on the given set. In such a case the function does not 
assume a greatest or a least value on this set. For example, 


efx wos a <1, 
ae ey (4) 


clearly has the bounds 0 and | on the interval [0, 1]. The g.l.b. is a 
value (the least) of the function, but the l.u.b. (the number 1) does 
not belong to the values of the function and the function does not 
assume a greatest value on the interval [0, 1]. 


18. BASIC PROPERTIES OF CONTINUOUS FUNCTIONS 


We shall now establish four very important properties of con- 
tinuous functions. 


LemMa. If y = f(x) is continuous at the point a and if f(a) < b, 
then there exists a neighborhood U of a such that f(x) < b for 
allx € U. 


Proof. This is an almost trivial conclusion drawn from the very 
definition of continuity of a function at a point. Indeed, if a and B 
are selected so that a < f(a) < B < b then, by the definition just 
mentioned, we shall have a < f(x) << 6 <b for all x in some 
neighborhood of a. And it is obvious that the lemma remains valid 
if we replace the sign < by the sign >. 
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THEOREM 2. A function y = f(x) continuous in a closed interval 
[a; b| is bounded in this interval. 


Proof. Let us observe that f(x), being continuous in [a, 5], 
is continuous at each point of this interval. Let A be a point 
of [a, 5]; from the definition of continuity it follows that there 
exists a neighborhood U of A such that for any x € U we have 


S%) —1< fl) <f® +b. 


But this means that f(x) is bounded in U (all of its values are in- 
cluded in a certain interval). Thus, the function is bounded at the 
point A, since we have defined boundedness at a point as bounded- 
ness in a neighborhood of this point. And since f(x) is bounded at 
each point of the interval [a, 5], it must also be bounded on this 
whole interval (as we have already learned on p. 53). 


THEOREM 3. A function y = f(x) continuous in a closed interval 
[a, b] assumes on this interval a greatest and a least value. 


Proof. By Theorem 2, y is bounded in [a, 5] and therefore has a 
g.1.b. a and a l.u.b. B; we have only to show that these bounds are 
values assumed by y. It is sufficient to show this for the l.u.b. B. 

If 8 were not a value of f(x) for some x € [a, 6], we would have 
(x) < B for all x in this interval. For each x let 8B, be any number 
between f(x) and 8, so that /(x) < B, < B. By virtue of our lemma 
we can find a neighborhood U, of each x such that f(x’) < By for all 
x’ € U,. The family of neighborhoods {U,} constructed in this 
manner covers the interval [a, b], and by the Heine-Borel lemma it 


contains a finite subfamily U,,, U,,,....U,, which also covers 
[a, b]. But for every x’ € U,, we have f(x’) < B,,: denoting by f’ 
the largest of the numbers B,,, By.,.... Br, we see that for all 


x € [a, b] we have 
LOX) <B SB. 


Consequently. 8 could not be the least upper bound of /(x) on the 
interval [a, 6]. and this contradiction proves our theorem. 


We saw above (function (4)) an example of a bounded function 
which does not assume a greatest value on a given interval. We 
now know that this is possible only for discontinuous functions, and 
indeed the function (4) is discontinuous at x = 1. 
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It is also important to emphasize that Theorem 3 is valid only 
for closed intervals. For example. even such simple functions 
as vy = x and y = x? need not assume a greatest and a least value 
on open intervals. 


THEOREM 4. Ifa function f(x) is continuous on a closed interval 
[a, b] and if 


IDL pes) 


or 
SQ) > p> f), 


then there exists a number c in the interval [a, b] such that f(c) = wp. 


In short. a continuous function assumes all intermediate values 
between any two of its values. It is clear that discontinuous func- 
tions. in general, do not have this property. For example, the 
Dirichlet function assumes in every interval the values 0 and I, but 
does not assume any intermediate value. For the function (4) we 


have /(3) = : and f(1) = 0. but at no point in the interval +. 1 


~ 


does the function assume the intermediate value +. 


Proof of Theorem 4. Let us suppose first that p = 0, f(a) < 0, 
and f(b) > 0: we have to prove that at some interior point of the 
interval [a, b] the function f(x) assumes the value zero. Let us 
assume the contrary and divide the interval [a, b] into two halves. It 
is clear that in one of the two halves the function assumes at the 
end points values of opposite sign; we divide this half again into 
two equal parts and we select again that half at whose end points 
the function assumes values of opposite sign, and so on. In this way 
we obtain a sequence of nested intervals; let a be their common 
point. By our assumption, f(a) 4 0. Suppose that f(a) < 0. Then 
by the lemma on p. 55 we have f(x) < 0 for all x in some neighbor- 
hood U of a. This, however, is impossible as U contains an infinite 
number of our nested intervals at the end points of which f(x) as- 
sumes values of opposite sign. In the same manner, we can prove 
that the inequality f(a) > 0 is impossible. The contradiction thus 
obtained proves our theorem. For if » #0, it is sufficient to apply 
the result just obtained to the function f(x) — p, and our theorem 
is established. 
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Before we formulate the next property of continuous functions, 
we have to introduce a new concept of great importance to 
the study of this class of functions. We remember, of course, that 
continuity is a local property of a function. This situation is not 
changed at all by the fact that we speak of a function continuous in 
a given interval, as continuity in an interval means nothing more 
than continuity at each point of this interval and, therefore, does 
not at all change the local character of this concept. It is possible, 
however, to express the idea of continuity of a function in a given 
interval as a global property, that is, a property which is not stated 
in terms of the behavior of the function in the neighborhood of each 
point, but rather one which describes the behavior of the function 
on the interval as a whole. 


DEFINITION. We shall calla function y = f(x) uniformly contin- 
uous on the interval [a, b] if this function has the following property: 
for any positive number e, no matter how small, there exists another 
positive number 8 such that for any two numbers x, and Xz in [a, b] 
differing from each other by less than 6, we have 


fr) — f(%2)| <e. 


As you see, this definition does not refer to any individual 
point but attempts to characterize the behavior of a function on 
the whole interval [a, 6], stating that at any two points sufficiently 
close to each other the values assumed by the function differ by as 
little as we please. It is evident why we call this kind of continuity 
uniform. Its specific character is that it requires some uniformity in 
the behavior of the function in every part of the given interval; the 
points x; and x2 can be taken anywhere in the interval [a, b] 
as long as the distance between them does not exceed 6. 

It is quite clear that a function uniformly continuous on [a, 5] 
is necessarily continuous at each point of this interval (and thus is 
continuous on the whole interval). For by virtue of uniform con- 
tinuity, the inequality 


|x —a| <6 
implies the inequality 
If) -f(@| <e 
that is, for any neighborhood V, (f(a) — «, f(a) + e), of f(a) there 
exists a neighborhood U, (a — 6, a + 6), of a such that if x € U, 
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then f(x) € V. And this, of course, means that the function f(x) is 
continuous at x = a. 

It is extremely important that the converse theorem is also 
valid: that is, from the continuity of a function at each point of a 
closed interval [a, 5]. there follows the uniform continuity of the 
function on this interval. Thus. the requirement of uniform con- 
tinuity does not narrow the class of continuous functions (so long 
as we are referring to a closed interval). 


THEOREM 5. A function y = f(x) continuous at every point of a 
closed interval [a, b] is uniformly continuous on this interval. 


For open intervals this theorem is false. For example, the func- 


tion y = sin, clearly continuous at every point of the open inter- 
x 


val (0. 1), cannot be uniformly continuous in this interval, since 
no matter how small we choose 6 there exist (in the vicinity 
of zero) two points whose mutual distance is less than 6, while the 
difference of the values assumed by the function ) at these points 
is greater than one. 


Proof of Theorem 5. Let us assume that ) = f(x) is continuous 
at each point of the closed interval [a, 5]. Then, for every point x 
of this interval, there exists a 5, > O such that |f(x1) — f(x2)| <e 
when x; and x2 are contained in the interval [x — 6,, x + 6,]. Let 
us denote by A, the interval |x = 3a, xX + + 5s], It is obvious 
that the family of intervals {4,} covers [a, 6]. Therefore. by the 
Heine-Borel lemma, there exists a finite subfamily M of the inter- 
vals {A;} also covering the interval [a, b]. Let 6 be the length of the 
smallest interval in the set M and let x; and x2 be any two points 


of [a, b] whose distance from each other is less than 5 5. We must show 
that [f(x1) — f(x2)| < e. Since the point x; belongs to some inter- 
val E — 58 x + +5. of the family M, the point x2, whose dis- 


tance from x, is less than 55 <56n, belongs to the interval 


[x — 6,, x + 6,] which, of course, also contains x;. Whence. by the 
definition of 6, we have |f(x1) — f(%2)| <«. 
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19. CONTINUITY OF THE ELEMENTARY FUNCTIONS 


You know, of course, that the sum and product of any finite num- 
ber of continuous functions is also a continuous function, and that 
the quotient of two continuous functions is continuous in every in- 
terval in which the denominator does not assume the value zero. 
All these theorems are valid in the local as well as in the global sense, 
that is, regardless of whether we have in mind continuity at a 
point, continuity on an interval, or uniform continuity. The proofs 
of all these theorems can be found in any textbook; they are not 
of fundamental interest and there is no need to dwell on them here. 

Much more interesting is the question of the continuity of the 
so-called elementary functions, that rather small class of functions 
which are used in elementary mathematics, but retain their sig- 
nificance in higher mathematics. To this class belong, first of all, all 
those functions whose values can be obtained from the independ- 
ent variable by the application of the six basic algebraic operations, 
and further, a small number of transcendental functions: the 
trigonometric functions (direct and inverse), the exponential and 
logarithmic functions, and all possible combinations of the above 
mentioned functions (for example, x?(1 — x sin x) or cos 27°). 

All elementary functions are continuous except for possible 
breaks in continuity at individual isolated points, called points of 
discontinuity. (For example, + at x = 0 or tan x at x= > 
Note, however, that the functions 1 and tan x are continuous 


in their domains of definition. The points of discontinuity of which 
we are speaking are the points at which these functions are not de- 
fined. Regarding such points of discontinuity, see p. 65.) However, 
the proof of this assertion is not easy for all of the elementary 
functions. In most analysis courses this situation is not investigated 
fully. It is of great importance, however, and we shall have to make 
several remarks on the subject. 

First of all, it is very important to establish the continuity of all 
the polynomial functions. This is done as follows: Since for every 
polynomial P(x) and for every number a the difference P(x) — P(a) 
is divisible by x — a (Bézout’s theorem), we have P(x) — P(a) = 
(x — a)Q(x), where Q(x) is a polynomial. Since Q(x) is clearly 
bounded at the point a, we have P(x) — P(a) as x > a; that is, the 
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polynomial P(x) is continuous at a. Furthermore, from the conti- 
nuity of the polynomials there follows directly the continuity of 
Pa in every interval where the poly- 
nomial P(x) is never zero. 

The problem is solved equally simply for the trigonometric 
functions. The formula 


every rational function 











; X+a. x= 
sin x — sina = 2cos ~+4% sin = 
2 2 
and the consequent inequalities 
: 3 ee oe a2 
|sin x — sin a| < 2|sin ~—% <2| x 5 Gl a=lx—al| 











show that sin x > sin a as x ~ a. Thus sin x is a continuous func- 
tion, and quite analogously we establish the continuity of the 
cosine function. Hence the function 


sin x 
cos x 


tanx = 


is continuous at all values of x where cos x 0, that is, where 
XA (2k + IF, k an integer. 


The proof of the continuity of the exponential function at 
is more complicated. Let us assume, for definiteness, that a > 1] so 
that the function y = a? increases with x.1 We shall show first that 
at —>a@° = 1 as x +0. Again for definiteness, let us assume that 
x — +0, that is, we shall consider a? only for positive values of x. 
The case of x > —0 is completely analogous. Thus for x > 0 we 
have a* > 1, and in order to prove that the function a*— | as 
x — +0 we have only to show that, no matter how small the posi- 
tive number e may be, we have a? < 1 +4 e« for sufficiently small x. 
For every positive integer 1 we have the inequality 


(l+e)"=1+4+ ne 4+---> 278, 


whence (1 + €)"—> +0 as n— oo. This means that for sufficiently 
large n we have the inequalities 


a 
(l+e)">a and a<l+e. 


1The case of a < 1 may be reduced to that of a > 1 by writing a7 = rs with b > 1. 
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And since a? is an increasing function, for all sufficiently small x 
we have the inequalities 


l<at*<1l+e, 
or, a? tends to 1 as x > 0. 
The rest of the proof follows easily. The relation 
az = ae — ax(at-« = 1) 
shows that as x > a the function a* tends to az. Hence this func- 
tion is continuous at all values of x. 
The question of the continuity of the logarithmic and the 


inverse trigonometric functions can now be settled on the basis of 
a general theorem on the continuity of inverse functions. 


THEOREM 6. Ifa function y = f(x) is continuous and increasing) 
on a closed interval [a, b] then the inverse function x = oy) is also 
continuous on the interval [a, B], where a = f(a) and B = f(b). 


Proof. To show this, let y be any point of [a, BJ]. We have 
to prove that 


lim ¢(y) = (7) 
yoy 
We first let y— y — 0; since the inverse function g(y), like the 
function f(x), is increasing, lim @ ()) is sure to exist. Let us denote 
y>y— 


this limit by c, and we shall show that c = q(y). 
Since f(x) is continuous and g(y) > ¢ as y > y — 0, it follows 


that as yy — 0 we have y = f[g(y)] > f(c), whence f(c) = y. 
But it follows directly from this that 


c= gf] = o(), 


that is, 
lim ¢(y) = (7). 
y +y-0 
And since in the same way we can also show that 
lim @(y) = 9(y), 
y>y+0 
we have finally 
lim p(y) = Py), q.e.d. 
‘The conclusion is, of course, equally valid when f(x) is continuous and decreasing. 
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In order to be able to assert without special proof that 
composite elementary functions such as log sin x and 237?-47 are 
continuous, we need the following theorem on the continuity of 
composite functions (functions of functions). 


THEOREM 7. If y = f(x) is continuous on [a, b] and z = 9(y) is 
continuous on [a, B], where a and B are, respectively, the least and the 
greatest values of f(x) on the closed interval [a, b], then the function 
Z = @[f(x)] is continuous on [a, b). 


Proof. Let c be any point in [a, b], let y = f(c) and § = gy) = 
plf(c)], and let W be any neighborhood of the point §. By the con- 
tinuity of g(y), there exists a neighborhood V of y such that 
g(y) € W whenever » € V. Finally, owing to the continuity of f(x), 
there exists a neighborhood U ofc such that if x € U, then f(x) € V. 
and consequently, g[/(x)] € W. Since W is an arbitrary neighbor- 
hood of § = g[f(c)], it follows that 


lim ff] = Alf], 


and the continuity of g[f(x)] at any point c in [a, b] is established. 


20. OSCILLATION OF A FUNCTION AT A POINT 


The most convenient approach to the study of discontinuous 
functions is by introducing the concepts of oscillation in a given in- 
terval and oscillation at a given point. 

Suppose that a completely arbitrary function f(x) is defined on 
a closed interval [a, b] with the exception, perhaps, of a finite num- 
ber of points. If this function is not bounded in the given interval, 
we shall say that its oscillation in this interval is equal to +00; if, 
however, the function is bounded, then by the oscillation of the 
function in [a, b] we shall mean the difference M — m between the 
lu.b. M and the g.l.b. m1 of the function on this interval. In either 
case, we shall denote the oscillation of f(x) in [a, b] by the symbol 
(a, b). 

If the interval [a’, b’] is contained in the interval [a, )] 
(a <a’ <b’ <b), ther the bounds M and m’ of f(x) on [a’, 5’ 
will evidently satisfy the inequalities m<m' <M’ <M and 
hence, w;(a’, b’) < w;(a, 6). Consequently, if we select in the in- 
terval [a, b] any point c (the function may not even be defined at 
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this point), take a neighborhood [a, 8] of this point, and cause the 
end points a and £ of this neighborhood to approach c, then 
w;(a, 8), as a function of the interval, is nonincreasing.1 And, since 
the oscillation w;(a, 8), being a nonnegative quantity, is bounded 
below, it follows by an argument similar to the proof of Lemma 1 
in Lecture | (p. 17) that these oscillations tend to a limit. We shall 
call this limit the oscillation of f(x) at c, and denote it by w;(c). 
Thus 
w(c) = lim w,;(a, 8B). 
prceo 

The introduction of this concept allows us, first of all, to look in a 
new way at the continuity of a function at a given point, as the fol- 
lowing theorem shows. 


THEOREM 8. A necessary and sufficient condition for a function 
J (x) defined in a neighborhood of the point c to be continuous at that 
point is that w;(c) = 0. 


Proof. (i) Let w;(c) = 0. This means that in a sufficiently small 
neighborhood U = (a, b) of c, the oscillation of f(x) becomes arbi- 
trarily small. And since for every x € U we have 


If) — f)| <M — m = w,(a, 5), 


where M and m are the bounds of f(x) in U, it follows that 
the value of |f(x) — f(c)| will be as small as we please if x belongs 
to a sufficiently small neighborhood U of c. This, however, means 
that f(x) is continuous at c. 

(i) If w(c) = w > 0, then in every neighborhood U of c we 
have M—m>w. Therefore, there exist a and B in U such 


that f(a)<m + re and f(~) > M — ve and consequently, 
{(B) — fla) > >. But 

f(B) — f(@) = (8) — £1] + (©) — S@)). 
and from f(8) — f(a) S it follows that at least one of the 


two terms on the right side must be greater than re Thus there ex- 


1 Here we understand by the term neighborhood of c, any set which contains an open 
interval containing c. 
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ists a point x in U such that jf(x) — f()| re and since the 
neighborhood U is arbitrary, f(x) cannot be continuous at c. 


Points at which a given function f(x) is continuous are called 
points of continuity of f(x), and points at which it is discontinuous. 
points of discontinuity. Our theorem then asserts that w;(c) = 0 at 
points of continuity and w;(c) > 0 at points of discontinuity. 


21. POINTS OF DISCONTINUITY 


We are now in a position to attempt the introduction of some 
order into the multitude of exceedingly diverse types of discon- 
tinuity. Since the condition ws(c) = 0 distinguishes points of con- 
tinuity from points of discontinuity. we naturally expect that the 
quantity wy(c) will serve as a reasonable and convenient measure of 
the discontinuity of the function at this point. Our expectation be- 
comes even stronger as we recall the definition of w;(c). Clearly, 
w;(a. 6) measures the spread of the values assumed by f(x) in the 
interval [a, 6]. We may therefore regard the limit of this quantity. 
as the interval contracts to the point c, as measuring the spread of 
the values assumed by f(x) at points arbitrarily close to c, that is. 
as measuring that characteristic of the function which is responsi- 
ble for the discontinuity at c. Let us agree to call the quantity 
we (c) the measure of discontinuity of f(x) at c. This makes it pos- 
sible to compare different points of discontinuity according to the 
degree of discontinuity at these points. In particular. a function has 
its greatest discontinuity at those points where the oscillation is not 
bounded (w;(c) = +x). 

Associated with the concept of measure of discontinuity (or of 
oscillation at a point) is an important proposition which is a direct 
generalization of Theorem 5 on uniform continuity (p. 59). 


THEOREM 9. If the measure of discontinuity (the oscillation) of 
f(x) at every point of the interval [a, b] does not exceed > 0. then, no 
matter how small the positive number e. there exists a8 > 0 such that 
the oscillation of f(x) in any subinterval of length less than 6 does not 
exceed \ + €. 


Such a function might be called continuous within a tolerance of 
: in particular. setting A = 0 we obtain Theorem 5. You can prove 
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the present theorem for yourself along the same lines which were 
used in the proof of Theorem 5, only here the number A + e will 
play the part of e in the earlier proof. 

Thus we see that not only continuity, but also the local prop- 
erty consisting of boundedness of the measure of discontinuity at 
a given point, has its equivalent global property. The theorem 
which we have just stated has a number of applications, particu- 
larly in the integral calculus, and we shall meet with it again in one 
of the following lectures. 

Also very important is another principle for the classification of 
points of discontinuity. The essence of this principle is the compari- 
son of discontinuities, not according to magnitude, but according 
to the form of the discontinuity. As we know, a criterion of con- 
tinuity of f(x) at a is given by the equalities f(a + 0) = f(a) = 
f(a — 0). It may happen that both one-sided limits f(a + 0) and 
f(a — 0) exist, but at least one of them differs from f(a). In such a 
case, we shall say that a is a point of discontinuity of the first kind of 
(x). Thus, a point of discontinuity of the first kind is character- 
ized by the fact that limits of f(x) exist as we approach this point 
from either side, but either these limits are unequal or, though 
equal, differ from the value of the function at this point. An exam- 
ple of this first type is portrayed graphically in Figure 13 (here the 
value of the function at the point of discontinuity can be arbitrary); 
an example of the second type is shown in Figure 14. 








Fig. 13 Fig. 14 

The simplest type of point of discontinuity is that of the first 
kind; the relative ease of its investigation is, of course, the result of 
the existence of the limits f(a + 0) and f(a — 0). All other points 
of discontinuity are called points of discontinuity of the second 
kind. That is to say, at a point of discontinuity of the second kind 
the function fails to tend to a limit as the point is approached from 
at least one direction. 
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An example of a discontinuity of the second kind is furnished by 


the behavior of the very instructive function » = sin | as x0 
x 


(Fig. 15), which we have already mentioned more than once. And 





the Dirichlet function, which we mentioned at the beginning of this 
lecture. has a discontinuity of the second kind at every point. 


22. MONOTONIC FUNCTIONS 


A function y = f(x) is called nondecreasing in a closed interval 
[a. b]. if f(x1) < f(x2) whenever a < x1 < x2 < 6b: if under the 
same assumptions we always have f(x1) > f(x2). the function f(x) 
is said to be nonincreasing in [a, b]. The nondecreasing and nonin- 
creasing functions together form the class of monotonic functions. 
Monotonic functions have a number of special properties which 
make them in many instances a very convenient tool of investigation. 

First of all, if f(x) is monotonic in an interval [a. 5]. it is 
bounded in this interval. (As usual, the interval is taken to be 
closed. For open intervals the assertion is not true: the function 


y= - is monotonic but not bounded in the open interval (0. 1).) For 


” 


ifa<x <b, f(x) lies between f(a) and f(b), since the greatest 
and the least values of a monotonic function in the interval [a. 5] 
are assumed at the end points. It follows that f(a) and f(b) are the 
g.l.b. and the l.u.b. of f(x) in this interval. 
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A monotonic function can have points of discontinuity, as a 
glance at Figure 13 shows. But the discontinuities of such a func- 
tion are limited both as to their character and their number. First, 
because of their boundedness, monotonic functions cannot have 
discontinuities of infinite measure. Further, the number of points 
within the interval [a, b| at which the measure of discontinuity of a 
monotonic function {(x) exceeds any positive number 7 is not greater 


than LO) Ko) 


To see this, let us assume arbitrarily that f(a) < f(b) and let us 
suppose that the function f(x) has within the interval [a, b] n points 
at which the measure of discontinuity exceeds t; we denote these n 
points, from left to right, by cy, co,..., ¢, so that for an arbitrarily 
small e > 0 


f(r +8) —-f(erk —-8 >t Ak <n). 


If (as is always possible) we choose e so small that a < ¢, — ¢, 
Cn +e < b, and 


Cera ae Shs nl), 


l.e., SO that no two of the neighborhoods (cx — ¢€, cx + €) overlap 
and all of them are wholly contained in [a, b], then we have 


nr< 3S flee +8) — flee — 0 < f) — f(a, 
kr=1 
whence 


1 LO=LO, ea 


A monotonic function can thus have only a finite number 
of points at which the measure of its discontinuity exceeds a given 
number. In the case of nonmonotonic functions the situation is dif- 
ferent; for example, the measure of discontinuity of the Dirichlet 
function equals one at every point. 

Finally, if a function f(x) is monotonic in a closed interval 
[a, b], then by Lemma I’ of Lecture 1 (p. 18) the limits /(¢ + 0) and 
[(c — 0) exist at every point c of this interval (except. of course, 
the end points which have only one-sided limits).:It follows from 
this that a monotonic function can have only points of discontinuity 
of the first kind. 
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23. FUNCTIONS OF BOUNDED VARIATION 


Suppose f(x) is defined on a closed interval [a. b]. Let us sub- 
divide this interval in anv manner into 7 subintervals and denote 
the points of partition from left to right DY Nal Nee AF OF 
convenience. let us set a = Xo and b = x,. so that 


C= Nee Ny Xe eel Ajay < Sy Sd, 


The sum 


S= > if (xn) — f(xr-1)| 


depends on our partition of the interval [a. b]. and this sum will. in 
general, have different values for different partitions. Performing 
every possible partition (changing the number n arbitrarily). we 
shall obtain an infinite set of sums S. The l.u.b. © of this set 
(which may. of course. equal +2) is called the toral variation 
of f(x) in the interval [a, 6}. and we shall denote it by V;(a. 6). 

If V;(a. b) is a number (V;(a. b) < +x). we say that f(x) is a 
function of bounded variation on [a. b}. If Vs(a, b) = + x. we say 
that the variation of /(x) in [a. 5] is infinite (or unbounded). 

The class of functions of bounded variation plays a verv im- 
portant role in analysis and its applications. In particular. it is 
evident that every function monotonic in an interval [a, 5] is a 
function of bounded variation on this interval. since for any mono- 
tonic function f(x) the sum S$ for any partition is equal to 
if) — f(a]. Hence V;(a. b) = 'f(b) — fla)|. 

Of fundamental importance in the investigation of the general 
properties of functions of bounded variation is a theorem, with the 
help of which the study of even the most general type of such func- 
tions can be completely reduced to the study of monotonic functions. 


THEOREM 10. Every function of bounded variation is the difference 
of two nondecreasing functions (or equivalently, the sum of two mono- 
tonic functions one of which is nondecreasing and the other 
nonincreasing). 


On the basis of this theorem. all fundamental properties of 
monotonic functions concerned with the number and character of 
possible points of discontinuity can at once be extended to func- 
tions of bounded variation. 
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In proving this fundamental theorem let us agree, for brevity, 
to write V(x) instead of V;(a, x). Furthermore, we let 


P(x) =41V(X) +/@] and NOT) -S@) 


from which 
S@) = Py) — N(x). 


Our theorem will then be proved if we show that the functions 
N(x) and P(x) are nondecreasing in the interval (a, 5). 


Proof. Let a < x1 << xX2 < b. Since 


2[P (x2) — P(x1)] = Vee) — V(xr) + [f(%2) — fQ%)] = and 
2[N (x2) — N(x1)] = V (x2) — Vr) — [f2) — fr) I; 


it is sufficient for our purpose to show that 
V (x2) — Va) = Lf (re) — fa)]. (5) 


Let e be any positive number. By the definition of V(x) as the 
lu.b. of the sums S, there exists a partition of the interval [a, x] 
such that the corresponding sum S = S(a, x,) exceeds V(x) — «. 
But the sum S(a, x1) + |f(x2) — f(x1)| = S(@, x2) is obviously one 
of the sums S for the interval [a, x2] and therefore does not exceed 
the l.u.b. V (x2) of such sums. We thus obtain 


V(x1) — € + [f%2) — f(r)| < SG@, x1) + [f(%e2) — f%1)| 
= S(a, x2) < V(xe), 


from which 


V (x2) — VOx1) > [f@e2) — f@r)| — «. 


And, since e can be taken arbitrarily small, the inequality (5) fol- 
lows and our theorem is established. 
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4. Series 


24. CONVERGENCE AND THE SUM OF A SERIES 


We shall devote this lecture to infinite series, one of the most 
important tools of mathematical analysis. As you know, in the 
great problems of mathematical analysis, infinite series appear as a 
purely technical tool of investigation, very useful and convenient, 
but rather modest in theoretical significance. Nevertheless, this tool 
fully deserves a special lecture even in our brief course. Not only do 
its many applications pervade the whole structure of analysis, as 
well as almost all applied sciences, but in the relatively simple 
subject matter of the theory of series. trains of thought and logical 
patterns typical of the whole of analysis appear especially clearly. 
It is well known that for a student who has actively and thoroughly 
mastered the theory of series, the further study of the basic content 
of analysis will present no unusual difficulty. 

An infinite numerical series is an expression of the form 


Uy + Ug +--+ + Un +-°- (1) 
or 


x 
> Un, 
nr=1 


where we shall assume all the u, to be real numbers. The numbers 
u, are called the terms of the series. And the finite sums 


are called the partial sums of the series (1). 
The main question concerning any given series is that of its 
convergence. If the limit 


lim s, = s 


Nox 


exists, the series (1) is called convergent and the number 5 is called 
its sum. Otherwise, the series is called divergent and has no sum. 
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When s, > +00 OF S, > —00 asn— ©, we are formally entitled 
to include the series in either of these categories. But we usually 
consider such a series as divergent, so that by the sum of a series 
we always mean a number. However, in giving the name divergent 
to those series with infinite sums as well as to those which have no 
sum at all, we must remember that these two types of series basi- 
cally have nothing in common. They are considered together only 
because both are unlike (although in a different sense) series with 
finite sums. 

In its basic content, analysis deals only with convergent series. 
Students inevitably form their initial idea of the sum of a series by 
analogy with ordinary finite sums. And very frequently the lecturer 
himself encourages, or even directly fosters, such a notion. If ap- 
proached with adequate caution, this view may actually contribute 
to our comprehension of the subject. We must not, however, carry 
this analogy too far, and, above all, we must not attribute to it any 
power of proof. 

The analogy between infinite series and finite sums can be car- 
ried to considerable length only for so-called absolutely convergent 
series (which we shall discuss later). For conditionally convergent 
series, our tendency to regard their sums as analogous to finite 
sums is immediately confronted with contradicting evidence; in 
particular, it 1s difficult to regard as the sum of all the terms of 
a series a value which can be changed by changing the order of the 
terms. Actually, the process of forming the sum of a series is not 
at all similar to the process of finite addition and does not consist 
(as some would imagine) in adding one term of the series after 
another until all of them are exhausted. The very term infinite im- 
plies the impossibility of exhausting the terms of a series by suc- 
cessive finite operations. But in spite of this, we can continue 
to speak of the sum of a series if we give up this hopeless process 
of infinite addition and substitute for it the completely different 
operation of passage to the limit. 

We begin, as in the case of finite sums, with successive addition, 
forming the sums sj, S2, etc. But we do not intend to continue this 
process without limit. While forming these partial sums, we 
examine carefully their nature and structure, and, in particular, 
their dependence on the number z. In other words, the subject of 
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our study is the quantity s, as a function of the variable n. In many 
cases, we obtain the whole picture of this functional dependence at 
once by obtaining a convenient analytical expression for the func- 
tion s, (for example, the geometric progression familiar from ele- 
mentary algebra). More precisely: we try to find out whether the 
quantity s, approaches a limit as n— oo, and if so, what limit. 
Thus the determination of the sum of an infinite series is composed 
of two successive phases: (a) formation of the partial sums s, and 
investigation of their dependence on n, and (b) passage to the limit 
as n— oo. You can see that this has little resemblance to addition 
until the terms are completely exhausted, and we must keep this dif- 
ference in mind to avoid the always present danger of being 
led into error by unfounded analogies. 

We see that the problem of the sum of a given series (1) reduces 
entirely to the problem of the limit of the sequence 


S15.893 0606 25 Spa ees (2) 


associated with this series. Conversely, given any sequence (2) we 
can always reduce the problem of the limit of this sequence to the 
problem of the sum of the series (1) by setting 


uy = $j, 
ug = $2 — S1, 


u3 = $3 — $2, 


Un = Sn — Sn-1. 


Viewed in this manner, the basic problem of the theory of series 
does not differ at all from the basic problem of the theory of 
sequential limits. In fact, it is only this particular approach to the 
study of sequences, considering them as the successive partial sums 
of series, which gives rise to a distinct theory with special features, 
problems, and methods. Yet this approach is interesting enough 
from the theoretical point of view and important enough with 
respect to applications to justify a special theory, the theory of in- 
finite series. 
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25. CAUCHY’S CONDITION FOR CONVERGENCE 
The reader is already familiar with the following proposition: 
THEOREM |. Ifa series (1) is convergent, then u,—> 0 asn-> w. 
For, since 
Un = Sn — Sn—1 when n > | 


and the sums s, and s,_; have the same limit as n > o, it follows 
that uv, > 0. This property of convergent series is important because 
in many cases it allows us to establish easily the divergence of a 
series (by showing that u, is not infinitely small as n — oc). On the 
other hand, from the fact that u, > 0 as n— oo, it does not follow 
that the series (1) is convergent. For example, the harmonic series 


el l 
] an a a an a ae All a 
is divergent (s, > +00 as n-> oo) even though its nth term 1s in- 
finitely small as n—> oo. This one-sidedness of the property in ques- 
tion (necessity without sufficiency) of course limits its scope of 
application considerably. 

The problem of the convergence of a series is, as we have seen, 
only a special form of the problem of the existence of the limit of a 
numerical sequence. Therefore, we obtain a necessary and sufficient 
condition for the convergence of a series by translating into the 
language of the theory of series the already familiar Cauchy’s con- 
dition for the existence of the limit of a sequence (Lecture 2, p. 32). 

We know that in order for the sequence of partial sums Ss, to 
approach a limit as n —> oo, it is necessary and sufficient that for 
any e > 0 the inequality 


| Snak — Sn | <e 
be satisfied for all sufficiently large n and any k. Since 


nt+k 
Sntk — Sn = >) Mis 
i=n4l 


we have the following theorem. 
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CAUCHY’S CONDITION. For the convergence of the series (1) it is 
necessary and sufficient that for any « > 0 the inequality 


| nod + Unzo teee t+ Un+k | <e (3) 
be satisfied for all sufficiently large n and any k. 


In descriptive terms, the requirement of Cauchy’s condition is 
that the absolute value of any segment of the series, however long, 
becomes as small as we please, provided that the segment is located 
far enough to the right. It is evident, as in the theory of limits, that 
Cauchy’s condition can only establish the existence of the sum of a 
series and does not tell us anything about its value. 

Cauchy’s condition, although a very powerful tool in general 
theoretical investigations, is applied relatively seldom to prove the 
convergence of a specific series (as we have already remarked in an 
analogous connection in Lecture 2). The reason for this is that it is 
usually not easy to establish whether condition (3) is satisfied by a 
given series. Therefore, for use in the theory of series, we construct 
a large number of other criteria for convergence. These tests, un- 
like Cauchy’s condition, are not characteristic (both necessary and 
sufficient), but are applied to specific series with incomparably 
greater ease. 

The majority of these criteria pertain to series with positive 
terms. For expositional reasons, it is more convenient to inves- 
tigate this simplest, and at the same time most important, class of 
series at the very beginning. We do this in order to be able later to 
reduce, as far as possible, the study of series to the study of only 
this simple type. 


26. SERIES WITH POSITIVE TERMS 


If all the terms of a series (1) are positive, then s, is clearly a 
nondecreasing function of the variable n. By Lemma | of Lecture 
|, there exist two possibilities: either the series converges or lim s, = 


nox 
+ cc. In other words, the series converges or diverges, depending 
on whether the quantity s, is bounded or unbounded as n— oo. 
All the criteria for convergence of series in this class are, therefore, 
established by showing that under the appropriate conditions the 
sequence Ss, is bounded. 
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Underlying the majority of these criteria is the following 
extremely powerful comparison test: 


THEOREM 2. If all the terms of the series 
Uy + Ug +--+ + Un t+--- (A) 
and 
Va bg ieee Vy ees (B) 
are nonnegative and if for n > no we have the inequality 
Un < Va, (C) 


then the convergence of the series (B) implies the convergence of the 
series (A). 


Proof. The proof of this basic fact is very simple. Let us denote 
by 5s, and s,’, respectively, the partial sums of the series (A) and 
(B). If (B) is convergent, the quantity s,’ is bounded. And since by 
virtue of condition (C), we have 


Sn — Sno S Sn — Sno > 


it follows that the quantity 5» < Sn’ + (Sny — Sn’) is also bounded 
and, consequently, the series (A) converges. 

Taking the series (B) to be any series whose convergence has al- 
ready been established, it is often possible to prove that some 
specific condition on a series (A) implies the relation (C) between 
its terms and those of the series (B). Such conditions may then 
serve as criteria for convergence. For example, if for (B) we choose 
an ordinary geometric progression, we obtain, at once, the simplest 
tests of convergence, those of d’Alembert, Cauchy, and others. 


These, as well as all other tests, in one way or another, require 
the terms of the series to decrease rapidly enough as n increases. For 


example, the familiar test of d’ Alembert requires that the ratio Uns 


n 
be bounded by a number smaller than one for sufficiently large n. 
This guarantees in an obvious way the sufficiently rapid decrease 
of uv, as n increases. 
We need not discuss here the various tests of this kind; you will 
find many of them, together with complete proofs, in every text- 
book. Instead of occupying ourselves with this primarily formal 
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theory, we shall try to examine more carefully our notions con- 
cerning the rapidity of convergence of series with positive terms. 
This problem, though almost never discussed in courses for pro- 
spective technicians, actually has a direct practical significance, in 
addition to its theoretical interest. For if a series converges slowly, 
that is, if in order to obtain a sums, sufficiently close to the limit- 
ing value s it is necessary to add a very large number n of terms, 
such a series cannot be used as a tool for the approximate evalua- 
tion of the number s. Therefore, from a practical point of view it is 
not very useful. But let us observe that occasionally we come 
across the opposite situation: a divergent series may, under certain 
conditions, prove to be a very convenient tool for the practical 
computation of certain quantities. This fact prompted the cele- 
brated scientist Henri Poincaré, who investigated this phenomenon. 
to express the paradoxical idea that convergent series sometimes 
turn out to be practically divergent and, conversely, divergent 
series to be practically convergent. 

For every convergent series, the remainder r, = 5 — 5, = 
Uny1 + Unze +--+ iS infinitesimal as n tends to infinity. The rapidity 
of convergence of the series depends entirely on the order of mag- 
nitude of this infinitely small quantity or, more precisely, on how 
rapidly it tends to zero as n tends to infinity. It is usually con- 
sidered that a series converges with good rapidity if, as 1 —> oo, the 
remainder r, decreases as rapidly as the nth term of a geometric 
series. For this to occur, it is sufficient that the terms u, of the 
given series should not exceed the corresponding terms of some 
geometric series for sufficiently large n. Indeed, from the inequality 


Un = aq” (n = No); 


where a and g are constants (a > 0,0 <q < 1), it follows that for 
n > no we also have the inequality 


ice a(qr*} + gue Bee -) = iat 
Much less rapid convergence characterizes the series where the 
remainder r, decreases aS some negative power of the number 
n. This convergence is already so slow that the use of such a series 
for practical purposes often presents difficulties. It should be noted, 
in this connection, that if u, is less than an~* (where a >0O and 
k > | are constants, and k need not be an integer) for n sufficiently 
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large, then r, < bn~*t1, where b is some positive constant. For, by 
applying Lagrange’s mean value theorem to the function f(x) = 
xk-1, we obtain for x > 0: 

(x + 1)k-1 — xk-1 = (k — 1)(x + OFZ D> (kK — Ixk? OKA <)). 
Substituting x = n — 1 and dividing both sides by n*¥-\(n — 1)", 
we obtain! 

(n — 1)-etE = poet 

nk-l — (n— 1)k-1 (kK — 1)(n — Ik? = k—1 


nk-1(n — ])k1 nk-1(y — 1)k-1 nk 


and consequently 








ee l _ ])rkt1 yok 

F< le = yt = ry 
If un, < an-*, this gives us 

Un < 7 < [(n — Ime — nok, 


Therefore, 





r= 3) ni SEAT D Uti DE Ge 





a —k+1 
= n 
k-—1 : 
which proves our assertion. 

Returning to the problem of tests for convergence, we first 
make the following observation: in every convergent series we have 
Un — 0 (as n—> oo), but even for series with positive terms this con- 
vergence of the general term u, to zero need not be monotonic. A 
simple example is obtained from the geometric series by inter- 
changing the terms uw, and ue, v3 and u4, and so on, as in the series 

be od 1 ] ] 


Be SP ee oe Tie Son FT aoa 


p2° 97 54.8 58 y > 


However, in the great majority of explicit numerical series with 
positive terms which occur in analysis, we have wn di < up 


‘For | << k <2, the last inequality of the preceding expression must be changed to 
(k — I)(x + #)k-? > (k — I(x + L)k-?. We can, however, still obtain the inequality 
(he) 


(n — 1)-k#1 — n-k+L > 
nk 
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(n = 1,2,...), that is, u, decreases monotonically as n increases. 
For this reason, a group of special tests for convergence of series 
with positive and monotonically decreasing terms deserves con- 
siderable attention; especially so, since this group contains many 
criteria which are characteristic (necessary and sufficient) and at the 
same time very convenient to apply. We shall consider closely two 
tests of this kind. 


THEOREM 3. (Cauchy’s integral test). Let f(x) be positive, contin- 
uous, and nonincreasing on the half-line0 < x < + o. Then for the 
convergence of the series 


FOUPRY @) ese ey CD brs (A) 


it is necessary and sufficient that the integral 


Ape (u)du  (a>0) (B) 


have a finite limit as a— oc. (This is obviously equivalent to the 
condition that the integral remain bounded as a—> +.) 


For convenience in presenting the proof, we have formulated the 
theorem by taking as our starting point the given function, not the 
given series. It is clear, however, that for any given series (A) with 
positive and monotonically decreasing terms one can easily con- 
struct any number of positive, continuous, and nonincreasing func- 
tions f(x) such that f(n) = u, (where n > 1). Hence our theorem is 
indeed a characteristic test for convergence in series of this type. 

Proof. Since f(x) is monotonic (nonincreasing), we have 


[0 Fed de > fer) > fF a 


from which it follows that 


fre au > & 7) > J. Teo du 


These inequalities show immediately that as n> oc the quantities 


> f(A) and |, "f(w) du 


are either simultaneously bounded or simultaneously unbounded. 
This proves the theorem, since for both quantities boundedness is 
equivalent to the existence of a finite limit. 
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The application of this test to individual series is in many cases 
quite simple. Suppose, for example, that wu, = n-* (a > 0). Setting 


] fO<x <1, 
fOr | ifl<x< +o, 


we obviously obtain a function satisfying the conditions of our 
theorem. If a ~ 1, we have 
tay = —1 
[fe ax a1 4 fix dx = 14% —. 


— a 


Thus, by virtue of the integral test, it follows that our series con- 
verges when a > | and diverges when a < 1. If a = 1, we have 


{re dx = | +f = 1+ Ina> +0 


as a—> oo, which proves that the harmonic series >’ = diverges. 
n=1 7 
THEOREM 4. If un > Uny1 > 0 (n > 1), then for the convergence 
of the series (1), it is necessary and sufficient that the series 
Uy + 2ug + 4uq + Bug +---+ 2UQn +--- (4) 
be convergent. 


Proof. Since the terms u, of the series decrease monotonically, 
it follows that 


gn 
2°-lupr-a > DS te & 2" Nun 
kx2"-141 


(note that the number of terms in the middle part of the inequality 
is 2"~1), from which we get 


2 n 


Sy Dhia > 2 Uk > oe > 2kuok. 


k=0 


These inequalities show immediately that the partial sums of the 
series (1) and (4) are either simultaneously bounded or simultane- 
ously unbounded, thus establishing the theorem. 


For our previous example where wu, = n~“(a > 0), the series (4) 
has the general term 2*-”. It now follows that the series >) n-* 
nol 


converges for a > | and diverges for a < 1. 
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27. ABSOLUTE AND CONDITIONAL CONVERGENCE 


We now turn to series whose terms have arbitrary sign. As you 
well know, it is customary to separate convergent series of this 
general type into two classes with essentially different properties: 
absolutely convergent series and conditionally convergent series. 

We say that the series (1) is absolutely convergent if the series 


Ja] + [ol t-+++ [unl +--- (5) 


converges. If, however, the series (1) converges, but the series (5) 
diverges. we say that (1) converges conditionally. 

We should note, first of all, that the absolute convergence of a 
given series is defined as the convergence of some other series. 
Therefore, the following theorem (which of course is well known 
to you) is by no means trivial: 


THEOREM 5. Every absolutely convergent series converges. 


The meaning of the theorem is that the convergence of the 
series (5) implies the convergence of the series (1), and the simplest 
way of proving it is by means of Cauchy’s condition. 


Proof. For any n > 0 and k > 0 we have 


k k 
| y Un+i ee Ss 
4=1 i=1 


and by Cauchy’s condition the convergence of the series (5) implies 
that the right-hand side is arbitrarily small for all sufficiently large 
n and all k. The same is also true for the left-hand side of the in- 
equality, and the convergence of (1) follows from another applica- 
tion of the Cauchy condition. 


Unsi 


> 











Among the properties of absolutely convergent series is an ex- 
tensive resemblance to finite sums. They can be multiplied term by 
term like finite sums (the distributive law), their terms can be 
arranged in an arbitrary order (the commutative law), and their 
terms can be grouped without affecting the convergence of the 
series or changing its sum (the associative law). In particular, all 
these properties are obviously possessed by series with positive 
terms, whose convergence is always absolute. 

We shall not dwell here on the proofs of these properties. They 
are purely formal, and those given in most textbooks will present no 
difficulty. 
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Let us look more’carefully, however, at conditionally conver- 
gent series. Consider, for example, the well-known series of Leibniz 
l 1 1 1 1 
PoE Bia ge Gg 
everything turns out well, the partial sums tend to a limit and the 
remainder tends to zero (and rather rapidly at that), just as for any 
convergent series. But how unstable this convergence is! It is 
enough to change suitably the order of the terms and it will have a 
different sum, and can even become divergent. And all this because 
the corresponding series (5), which is the harmonic series in this 
case, is divergent. You can see that a summation whose result de- 
pends upon the order of the terms cannot lead to expressions at all 
analogous to finite sums. The very term conditional convergence 
probably stems from a desire to indicate that the convergence here 
is not altogether without reservation. 

As to changing the sum and even destroying the convergence 
of the series by rearranging its terms, any conditionally convergent 
series Offers us unlimited possibilities. This is shown clearly by the 
following simple but remarkable theorem. 


THEOREM 6. Ifa series (1) is conditionally convergent and « is an 
arbitrary number or one of the symbols +0, then, by a suitable rear- 
rangement of the terms of the series, we can always construct a new 
series which will converge to a. 


For the proof, we establish first a lemma which is also of inde- 
pendent interest. 


Lemma. If the series (1) is conditionally convergent, then both the 
series (1’) formed by its positive terms and the series (1") formed by 
its negative terms are divergent. 


Proof. For let 5, = Sp’ + Sy’, where s,’ is the sum of the posi- 
tive terms and s,” the sum of the negative terms which occur in the 
partial sums s,. If both series (1’) and (1”) were convergent, then 
both of the partial sums s,’ and s,’’ would tend to a limit. Hence, 
the difference 


, 


Sn, — Sy" = [ui |+|u2|+- +++ | Un| 


would also tend to a limit, contradicting the fact that (1) is not ab- 
solutely convergent. If one of the series, say (1’), were divergent 
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and the other convergent, then as n — co we would have s,’ > 
while s,””—> c. where c is a number. But this would imply that 
Sn = Sy’ + Sn’ o0 and that (1) is divergent. Hence both series (1’) 
and |’’) must necessarily be divergent. 


We now turn to the proof of the theorem. There are two parts: 


Proof of part one of Theorem 6. a is a number. Suppose, for 
definiteness, that a > 0. We shall arrange the terms of the series in 
the following order: First. we take positive terms in their natural 
order, that is, the order in which they appear in the series; the sum 
of such terms will then be increasing. As soon as this sum becomes 
larger than a, we shall call it the first pivotal sum and begin to add 
to it the successive negative terms of the series, again in their 
natural order. The sum of the terms will now be decreasing, and, 
as soon as this sum becomes less than a, we call it the second 
pivotal sum and begin to increase it by adding successive positive 
terms again. Repeating this process indefinitely, we clearly rear- 
range the terms of the series (1) into a new order. We shall only 
have to show that the newly formed series has a sum equal to a. 
Before doing this, however, we must clarify certain details of the 
construction just described. How do we know that by taking suff- 
ciently many positive terms of the series we can obtain a sum 
greater than a? Here we use our lemma. The series formed from 
the positive terms of the series diverges and, therefore. by taking 
sufficiently many positive terms we can obtain an arbitrarily large 
sum, and in particular greater than a. Using this lemma repeatedly 
in the subsequent steps of our construction, we can be certain each 
time that the adding of positive or negative terms will eventually 
adjust the sum in the desired manner. 

Having thus convinced ourselves of the feasibility of this con- 
struction, we must now show that the partial sums of the new 
series, which oscillate about «, actually approach this number as 
a limit. It is evident from our construction that if we take two 
successive pivotal sums of the new series, then all of the interme- 
diate partial sums will be included between them. Hence, it is 
enough to show that the sequence of pivotal sums has the limit a. 
But it is clear that each pivotal sum differs from a by not more than 
the absolute value of its last term, a quantity which certainly tends 
to zero in view of the convergence of our initial series. Hence our 
theorem is established in this case. 
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Proof of part two of Theorem 6. a = +00 (if = — oo the proof 
is entirely analogous). Since u, > 0 as n— oo, the set of numbers 
|u,| (where n = 1,2,3,...) is bounded. Let B be its l.u.b. We take 
the sequence of positive terms of the series and subdivide it into 
blocks so that the sum of the terms of each block is greater than 
2B (this is possible in view of our lemma). We now rearrange the 
terms of the series (1) in the following way: We take the first of the 
blocks just constructed (in its natural order), then add the first neg- 
ative term, then the second block, then the second negative term, 
and so on. Since the sum of the terms of each block is greater than 
26 and each negative term is in absolute value not greater than £, 
it follows that by taking 7 positive blocks and n negative terms we 
obtain a sum exceeding 28n — Bn = Bn. From this we see that the 
partial sums of the newly constructed series increase indefinitely as 
n—> ©. 


We have deliberately dwelt on these constructions. They are 
both interesting and instructive, and compel us to look closely into 
the structure of the series, to dissect it logically, so to speak. It 
would be an interesting and useful exercise for you to show that 
by a suitable change of the order of the terms of a conditionally 
convergent series we can obtain a series which has no sum, finite 
or infinite. 


28. INFINITE PRODUCTS 


Besides addition, yet another arithmetic operation, that of 
multiplication, can be applied to an infinite sequence of numbers. 
As we looked previously for the limit of the sum of an infinitely 
increasing number of terms, so now we pose the problem of 
the limit of the product of an infinitely increasing number of fac- 
tors. And we shall, in analogy with the theory of infinite series, 
build up the theory of infinite products. Although this theory, 
basically developed long ago, does not yet have the extensive 
range of applications which the theory of series does, its range 
of applications is nevertheless already so extensive (and grows 
more so with every decade) that today this theory belongs in the 
arsenal of every well-trained mathematician. 

An infinite product is an expression of the form 


Z1 22 23°++Zyn--- = iM Py. (6) 
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We call the quantities 7, = z122...2Zy GPS 12, S533) partial 
products, and lim Tn, if it exists, is called the value of the product (6). 


In what follows, we shall constantly assume that all the factors 
2, are positive. Such a limitation, always imposed in the case of 
real-valued factors, is not brought about by a desire to simplify the 
investigation, but by more cogent considerations. In the first place, 
we may change the sign of the factors to make them all positive 
without affecting anything but the sign of the limit. But more im- 
portant, just as the nth term of a convergent series tends to zero, 
SO (as it is natural to expect and as we shall soon demonstrate) the 
nth factor of a convergent infinite product tends to one. Hence, all 
the terms from some finite point onward must not only be positive, 
but arbitrarily close to unity. 

When all the factors are positive, the study of infinite products 
can be reduced to the study of infinite series by the simple process 
of taking logarithms. Setting v, = In z, and z, = e’, we see that 
the convergence of the product (6) is equivalent to the convergence 
of the series 


Ytvo+ es: + in te: (7) 


and that the sum of the series (in the case of convergence) is equal 
to the logarithm of 7, the limit of the product (6). There is only one 
exception to this rule: if 7 = 0, the series (7) is obviously divergent 
(the partial sums tend to — oo). But infinite products with value 
zero exhibit in many other respects peculiarities which make 
them resemble divergent, rather than convergent, series. Thus, 
if + =O it is not necessary that z,-> 1 as noo (for exam- 
ple, z, = > n= 1,2,...). For these reasons, the product (6) in the 
case of 7 = 0 is customarily called divergent, and the convergence 
of the product is defined as the existence of a positive limit 
7 = lim z,. With this definition, the convergence of the product (6) 


nox 
is precisely equivalent to the convergence of the series (7). 

The possibility of completely reducing the theory of infinite 
products to the study of series might lead us to think that there is no 
need to construct a special theory of infinite products. But as 
in many similar instances, such a conclusion is false. As often hap- 
pens in mathematics, a new way of formulating old problems leads 
not only to new methods of solving these problems, but also serves 
a heuristic purpose in suggesting entirely new problems. 
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In the theory of infinite products it is customary to write 
the factors z, in the form 1 + un, where —1 < un << +00. Accord- 


n 
ingly, a, = Ha + u,) and the product (6) is written in the form 


2s 


(1 + ui)(1 + u2)---(1 + un)--» = [] C+ un). (8) 
ni=1 
If the product converges, then as n—> 00 we have 7,—> 7, where 
O0O<a7< + 0,50 that asn— 0 
Tn 


7 
Ss and 4 Ss = Lae, 
Tn-1 Tn-1 








Zn = 


A comparison of the product (8) with the series (7) indicates that 
to any series with positive terms there correspond products for 
which Zz, > 1 or, what is the same, u, > 0 (n = 1, 2,...); while to 
series with negative terms, there correspond products for which 
Zn < 1 and u, < 0. This leads us to begin our study with those in- 
finite products for which all the u, are of the same sign. The funda- 
mental and most frequently used theorem concerning such products 
is the following. 


THEOREM 7. [fall u, have the same sign, then for the convergence 
of the product (8) it is necessary and sufficient that the series 


Uy tug +++++ Unt:-- (9) 
converge. 


Proof. First of all, we may assume that u, — 0 as n— oo, since 
otherwise both the product (8) and the series (9) would diverge, 
and the theorem would clearly hold. Hence, we may distinguish 
two cases. 

(i) Suppose uv, > 0, n = 1, 2,....Since as x > 0 the function 
e* differs from 1 + x by an infinitely small quantity of the second 
order with respect to x, it follows that for sufficiently large n 
we have the inequalities 


< 1+ tn < evn, (10) 


Denoting by s, the partial sums of the series (9) and supposing, for 
simplicity, that the inequalities (10) are satisfied for all n (this does 
not restrict the generality of the argument since in problems of 
convergence we can always, without affecting the result, drop any 
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finite number of initial terms), we obtain by multiplying these in- 
equalities term by term (7 = 1, 2,..., 7) 


S 
e2” < Tm < esi, 


If the series (9) converges, we have sm <s< +00, whence 
Tm < e& << +00. Thus the quantity z,, remains bounded and, con- 
sequently, the product (8) converges. Conversely, if the product (8) 
converges, then, as 112 — oo we have a» <a < 00, which means 
that 


F5m Sint < +00. 


Hence, the sum s, remains bounded and the series (10) converges. 
(ii) Let u, < 0, = 1,2,....As before (but here s,, < 0), we 
obtain 
e28m < Tm < esm, 


from which, as in the preceding case, we can see that when 
Sm > 5 > —o¢ (that is, when the series (9) converges) we have 
Tm = e?§ > 0 and the product (8) converges. Conversely, when 
um > 7 > Owe have Sy > Int > Inc > —oo, 1.e., from the con- 
vergence of the product (8) there follows the convergence of 
the series (9). 

A consideration of the series (7) associated with the product (6) 
or (8) leads us directly to Cauchy’s condition for infinite products: 


THEOREM 8. For the convergence of the product (8) it is necessary 
and sufficient that for an arbitrarily small e > 0 we have 


eek Ti dnaay = bees (11) 


ix=nt+1 


for n sufficiently large and any k > 0. 


For those products where u, is of constant sign, this criterion 
can be replaced (by virtue of the preceding theorem) by another 
which is more convenient in the majority of cases, since it is easier 
to estimate a sum than a product. We may, without changing any 
of the verbal part of the statement, simply replace the inequality 
(11) by the inequality 


nt+k 
| Soul <e. 


i=n+1 
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We must remember, however, that this form of Cauchy’s condition 
is valid only in those cases where all the u, have the same sign. 

Unfortunately, within the framework of these lectures we must 
limit our discussion of the interesting subject of infinite products to 
what we have already said. 


29. SERIES OF FUNCTIONS 


So far, the terms of the series we have considered have been fixed 
numbers. But, as you know, the primary need in analysis is for 
series of functions, that is, series whose terms are functions of 
a single variable or of several independent variables. All that we 
have considered so far will serve as preparatory material for 
the theory of such series. 

Let 


r(x) + ua(x) $$ u(x) +o (12) 


be a series whose terms are functions of the independent variable x 
defined on the closed interval [a, b]. Giving this variable a numeri- 
cal value x = a, we transform the series (12) into an ordinary 


numerical series >) u,(a). From the point of view of the theory of 
n=1 


numerical series, we can now Say that the formula (12) expresses 
not a single series, but an entire continuum of different series. 

It is clear, of course, that for series of functions the problem of 
convergence is quite different from that for numerical series. It is 
meaningless to ask whether the series (12) converges or not, since, 
in general, this series will converge for some values of x and 
diverge for others. A sensible formulation of the problem is: 
for what values of x in |a, b| does the series (12) converge and 
for what values does it diverge? Let us agree to mean by the domain 
of convergence of the given series the set of values x for which the 
series converges; and by domain of divergence, the set of values x 
for which it diverges. We can see that the problem of convergence 
for a series of functions consists, first of all, in finding its domain 
of convergence. We shall not spend any time here with examples, 
since we shall encounter enough of them later on. 

In all analytic applications of the theory of series of functions, 
the notion of uniform convergence is of fundamental importance. 
The best way to approach the definition of this concept is as fol- 
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lows: If the series (12) converges at each point of a set M, then the 
remainder 


rn(X) = S(X) — Sn(x) 


(which, like the sum s(x) and the partial sums s,(x), is also a func- 
tion of x) tends to zero as n — oo for each x € M. But for our pur- 
poses, we must formulate a more detailed description of this fact. 
Thus, for any x € M and for any e > 0 there is an no (depending, 
of course, on both e and x) such that for all m > 9 we have 
|7n(x)| < e. Now let e remain fixed and let x vary in M. To each x 
there corresponds an no, 1.e., a place in the given series, beyond 
which |rn(x)| < e. But does there exist in this series a place beyond 
which the inequality |7,(x)| < e is satisfied for all x in M? This, of 
course, depends on whether the set of numbers no corresponding 
to the various values of x is bounded or not. If among these num- 
bers there is a greatest one, it can obviously serve as the place be- 
yond which we shall have |rn(x)| < e for all x in M. If, however, 
among the numbers vo we can find arbitrarily large values, then, 
however far we proceed in our series, we shall always find values 
of x in M for which the place corresponding to e has not yet been 
reached. In this case it is impossible to find an no applicable to all 
xemM. 
Let us look at an example of this latter type. Suppose that 


SX) =) =) O=< #217 2:1), 
that is, let 
Un(x) = x™(1 — x") — xr-1(1 — x), 
Obviously, for each x € [0, 1] we have s,(x) + 0 as n— oo, so that 
r(x) = —S,(x) = —xr(1 — x”), 
from which 


i 1 
n2")=-q (n=1,2,..), 


1 
Since, for any n > 0 the number 2 ” belongs to the interval {0, 1], 
it follows that however large n may be, the inequality |rn(x)| < r 


cannot be satisfied for all x € [0, 1]. This means that we actually 
have here the second of the cases described above. 
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y= (5 @)|  yox-? 


Y= |tn(x)| 
Ce eee IN 


1 ay = x 


2 





1 
Fig. 16 


To illustrate this graphically, we give in Figure 16 the curve y = 
|ri(x)| and (schematically) the curve y = |r,(x)| for a large n. The 
1 


curve y = |r,(x)| has a maximum, equal to i atx = 27. Every- 


4 
where to the left of that point, except for its immediate neighbor- 
hood, the quantity |r,(x)| is extremely small. As n increases the 

1 


point 2” approaches the value 1. From the graph we can see what 
actually occurs. The fact, paradoxical at first glance, that for each 
x we have r,(x) — 0 as n— oo, while for every n there exists an x 


such that |r,(x)| = + can be explained by noting that the point at 


which |r,(x)| attains this maximum value does not remain un- 
changed, but rather moves to the right, approaching the value 
| as n— oo. This delay in the approach of r,(x) to zero moves, 
therefore, closer and closer to the point | as n increases. We 
naturally expect that at the point | itself, the arrival of this delay 
will cause the divergence of the series at this point. But in reality 
everything turns out well since r,(1) = 0 for all n. 

Let us now return to the two cases which we previously agreed 
to distinguish. In the first of them we say that the series (12) con- 
verges uniformly in the set M, and in the second that it is nonuni- 
formly convergent. That is, the series (12) is uniformly convergent in 
a set M if for an arbitrarily small positive e, there exists an no 
(depending only on e) such that for alln > no and for all x € M we 
have 


Irn(x)| <e. 


We have discussed above in detail an example presenting a typical 
picture of nonuniform convergence. 
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Let us observe that we can also define uniform convergence of a 
sequence of functions 


Fil), fo), «- fal), 


to a limit function f(x) in the same way if r,(x) is understood 
to mean the difference f(x) — f,(x). Thus, we can say that the uni- 
form convergence of a series is equivalent to the uniform conver- 
gence of the sequence of its partial sums. 

Just as absolute convergence is important for arithmetical oper- 
ations with series, so the hypothesis of uniform convergence is not 
only convenient, but sometimes even indispensable for a great 
many conclusions of an analytic nature. We shall now look at a 
few examples demonstrating this. 

First, let us consider the following problem: under what condi- 
tions can we deduce the continuity of the sum s(x) of the series (12) 
in the interval [a, b] from the continuity of its terms? That the sum 
s(x) may be discontinuous even when its terms are continuous is 
shown by the following example: 


U(x) = x7 —x™ (n> 1,0<¢x < J), [u) = 1) 


SAX) = 1 — x (n > 1), 
so) = | fO0< x <i, 
er et est eee 


We shall now prove the following: 


THEOREM 9. If the series of functions (12) converges uniformly, 
then the continuity of its terms implies the continuity of its sum. 


Proof. Let the series (12) converge uniformly in [a, 5] and let all 
its terms u,(x), and so all its partial sums s,(x), be continuous; let 
a be any point in [a, b] and let e be an arbitrarily small posi- 
tive number. By virtue of the uniform convergence of (12), we can 
choose n sufficiently large so that 


Jsa(x) = s()| <$ 
for all x € [a, b]. Fixing temporarily this value of n and using the 
continuity of s,(x) in [a, 5] and, in particular, at the point a, we can 
assert that for all x in a neighborhood U of a we have the inequality 


|sn(x) — sn(a)| <5 
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Then, for each x € U we have 
[s(x) — s(a)| < [s(x) — Snx)| + [5nX) — Sn(a)| + | S2(a) — s(a)| 


E € C7 
<3 Pay = 


which shows that s(x) is continuous at x = a. 


The converse of the theorem just proved is not true. The sum 
of a series may in some cases be a continuous function even 
though the series does not converge uniformly. This can be ob- 
served in the example given on page 89, where the series is non- 
uniformly convergent while its sum is equal to zero in the whole 
interval under consideration. Thus, the uniform convergence of a 
series of continuous functions, though it guarantees the continuity 
of its sum, is not a necessary prerequisite for this continuity. A 
theory somewhat more fully developed than the considerations we 
have so far presented makes it possible to determine a type of con- 
vergence which constitutes both a necessary and sufficient condi- 
tion for the continuity of the sum. In practice, however, we nearly 
always deduce the continuity of the sum from the uniform conver- 
gence of the series. 

Another problem in which the idea of uniform convergence 
finds application is the problem of term by term integration of 
a series of functions. Let us assume that all the terms of the series 
(12) are continuous (and thus integrable) functions on a closed in- 
terval [a, b]. Can we assert that the sum s(x) of the series is 
also integrable over this interval and that 


- (x) dx = > f *in(X) dx (13) 


as in the case of finite sums? It is easy to see that this is true in all 
the examples we have considered so far. The relation (13) is 
obviously equivalent to either of the relations 


lim {"s,(x) dx = i, ” s(x) dx 


Nox 


Or 


lim { Tal X) dx = 0. 
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ON” ob 
"Fig. 17 Fig. 18 

It is possible, however, to show that these relations are not al- 
ways valid. First of all, it may turn out that s(x) is not integrable 
over [a. b]. To show this it is sufficient to take the function (Fig. 17) 


mx £0 <x<t. 
1 


Qr-1 


SO hy — ch 
— if—<x<l, 
ie n 

(you can easily find, if you wish, the corresponding expressions for 


uy(x)). Here the sum 


LHe 1, 
SX) 4% 
Odea oO 


is, as you know, nonintegrable over (0, 1] since 


f’sc) dx =I1n i co asa— 0. 


Secondly, it may happen that s(x) is integrable over (a, b], yet (13) 
turns out to be false. For example (Fig. 18), let 


Sx) ae tS ae) eae Sie ae a 
1 Here again, 5;(0) = 1. 
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Since for0 < x < 1 we have lim nx"! = 0,1 it follows that s(x) = 


0(0 < x < 1); hence 
f,s@o dx =; 


At the same time (as can easily be computed), we have 


poe Cee 
22n—1). 2’ 


bs — 4 m-1 _ y2n-2 = i 
j, Sy(x) dx = nf (x x2n-2) dx = 5 


and so (13) does not hold. 
But it is easy to prove the following: 


THEOREM 10. For every uniformly convergent series of continuous 
functions, the relation (13) is valid. 


Proof. First of all, we know that in this case s(x) is continuous 
and therefore integrable over [a, 5]. Further, for every e > 0 there 
exists an integer mo such that for n > no and any x € [a, b] we have 


Irn(x)| <e. 


Applying the well-known theorem from the integral calculus by 
which the inequality 


VO| See) @sx<bd) 
implies the inequality 


| [709 ax] < oC) dr, 


we have 


| [Pr ax| 2b = a) form > He, 


1 An elementary proof of this property is as follows: 


LetO0<x<1,t=2,andz—1= y>0.Then 
x 
z= (1 + yy > 1 + ny + aod wae Mya! 
from which 
jee I eg asn— oo. 


zr (n — l)jy? 
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and this, in view of the arbitrary choice of e, means that 
b 
f r(x) dx 30 as n— 00, 
a 


which is equivalent to (13). 


Although uniform convergence is a sufficient condition for the 
term by term integrability of a series of continuous functions, it is, 
as in the problem of continuity of the sum, not a necessary condi- 
tion. To show this, it is enough to consider again the example 
on p. 89. Here the convergence is nonuniform, but at the same time 
we have 





: _ 2n = ? n = l _ ft 
Jiro) ax = [x dx ie d&k=s 7 -GeT7?? 


as n— oo, that is, (13) holds true. 
The uniform convergence of a given series is most frequently 
established with the aid of the following simple test: 


THEOREM 11. If > ai is a convergent series with positive terms 
and if for n supicieniny lange and all x € M we have 
|Un(x)| < On, 
then the series (12) converges uniformly on M. 


Proof. Under the conditions of our hypothesis, we have 








nt+k n+k n+k 
5 wi] <'S" fuco] < Sa <e 
ix=n+1 ix=n+1 ix=nt+l 


for a sufficiently large n, any k > 0, and all x in M. (The last 
of this string of inequalities follows from the convergence of 


Ss a, and Cauchy’s condition.) Applying Cauchy’s condition, it 
n=1 

follows that the series (12) converges. Letting k— oo in the in- 
nt+k 

> ¥i(x) 
i=n+1 


all x € M we have 


< e, we see that for sufficiently large n and for 








equality 


[rn(x)| Se 
Hence the convergence is uniform. 
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30. POWER SERIES 


Without a doubt, the most important special class of series of 
functions consists of the so-called power series, that is, series of the 
form 


Ao + ayX + Aox? +--+ 4 Ayx®4---, (14) 


where do, Q1, @2,..-, Qn... are given real numbers. We shall now 
examine some of the most important properties of these series. 

To begin with, for any given power series there always exists an 
open interval (—r, r), the so-called interval of convergence, within 
which the given series converges and outside of which it diverges 
(with the possible exception of the points —r and r). The radius of 
convergence r may have any value from 0 to o, not excluding 


either of these limiting values (for example, r = oc when a, = Pri 


and r = 0 when a, = n!). When the radius of convergence is given, 
the interval of convergence is determined only up to its two end 
points and may be either closed, open, or half-open (i.e. it may in- 
clude one of its end points, but not the other). For example, each 
of the three series with the respective coefficients a, = 1, a, = 
l ] 
oe (n + 1)? 
But the first of these series diverges for x = 1 and x = —1, that is, 
for both end points of the interval of convergence (this series is a 
geometric progression with the ratio x); the second series reduces 
to the harmonic series for x = 1 and to Leibniz’s series 


l l l l l 


and a, = has a radius of convergence equal to 1. 


te ge ge See Oe 
for x = —1, and thus it is divergent at x = 1 and conditionally 
convergent at x = —1; and the third series is absolutely con- 


vergent at both end points of the interval of convergence. 

We shall recall briefly the proof of our assertions concerning the 
domain of convergence of a power series. It is based on the follow- 
ing remarkable theorem. 


THEOREM 12. Jf a power series is convergent at x = a, where 
a £ 0, then it ts absolutely convergent at every value of x for which 


|x] < lal. 
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Proof. From the convergence of the series (14) for x = a it fol- 
lows that a,a" — 0 as n > co. This implies that there exists a num- 
ber c >0 such that |a,a"| <c (n = 1,2,...). Let us now take 
|x| < Ja]; then 

Ix 
a 
n 


S > 
Kkr-1 

<—¢ >. 

k= 1 


k 


ayxk ayak 

















k=) 
x 


k a 
al re 


kx=1 


x|* c 


a 














xX 
a 








Since the right-hand side does not depend on », the left-hand side 
remains bounded as n— > oo, and thus our series is absolutely 
convergent. 


We obtain at once from this theorem the special form of 
the domain of convergence for a power series, since, given any 
point x in this domain, all the points lying closer to zero also be- 
long to the domain. Another corollary of this theorem is that 
at every interior point of the interval of convergence the series 
is absolutely convergent. As to uniform convergence, we cannot 
assert that it holds for the whole interval of convergence. This is 
apparent from the example of the geometric series (a, = 1): no 


matter how large n may be, the remainder r,(x) = >) x* becomes 
k=n+1 


arbitrarily large when x is sufficiently close to 1. It is easy, however, 
to show the following: 


THEOREM 13. A power series is uniformly convergent in every 
closed interval whose end points ave interior to the interval of 
convergence. 


Proof. To show this, let r be the radius of convergence of 
the series (14) and let 0 <7’ <r; for all x satisfying the inequality 
|x| </’ and for any 1 we have 


|anx"| < Jan|r’”. 


Since the series 3 |a,|r’" is a convergent series with positive 
n=1 

terms, it follows (in view of Theorem 11 on page 95) that (14) 

is uniformly convergent in the interval [—/’, r’ ]. 
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From this there follows the highly important corollary. 


COROLLARY. The sum of a power Series is continuous al every 
point within the interval of convergence. 


ABEL’S THEOREM. If a power series (14) is convergent at x = 
r > 0, then it is uniformly convergent in the interval (0, r]. 


This remarkable theorem plays an important role in the theory 
of functions. To prove it, we first establish the following lemma, 
which is likewise due to Abel: 


LEMMA. 


oy An(bn, = bn-1) = > br(An = An+1) _ a + nelics 


nN=N1 N=N]1 
where 0 < ny < nz and the sequences a, and b,, are arbitrary. 


Proof of Lemma. 


> An(bn = bn-1) 
nam An, (bn, — bn,-1) + Qny+1(bn,41 = bn,) + An, +2(bn, +2 say bry +1) 
+ soe + Any—1(Dn.—1 = bn,-2) + An(bn, - bn2-1) 


= —An,bn,-1 + bn (An, = Qn, +1) + bny41(An,41 = An, +2) 
tere + bno—1(Ang—1 — An») + DycGin. 


ne 
= >, br(@n _ Qn41) > Gan, =3 + Ang +1Dn. 


Non, 
We now turn to the proof of the theorem itself. 


Proof of Abel’s Theorem. Let « be any positive number. By 
virtue of Cauchy’s condition, we have 


n+k 


> art 


i=n+1 
for all sufficiently large n and any k>0. For brevity, let us set 
ntk 


>, art = o, and oo = 0, so that |ox| << e(k = 0,1,2,.. .). Then, 
i=n4+1 
forO < x < rwe have 


n+k ; n+k .{x\i 
> ast = 5" av (2) 


i=n4l1 i=n41 


<e 
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Therefore, by the lemma just proved (and the fact that o9 = 0) we 
have 


nik ; k xy" (4yr""} (2 ntk+1 
ax) = Oi) | — 1 2 io 
5 a] =| > o{ (7 + o(7) 


tn+1 r 


< S | o;| (2)""(1 Ae) ae 


eae) eG) | 








It follows, by Cauchy’s condition, that the series (14) converges, and 
passing to the limit as k — oc we see from the last inequality that 
for sufficiently large n 


Irn(x)] <<e (OS x <7), 


which means, precisely, that the series (14) converges uniformly in 
the closed interval [0, r]. 


It should be stressed, however, that from the convergence of 
(14) at x =r it would be false to conclude that the series con- 
verges uniformly in the interval (—r, r). For example, the series 

2 3 
= aS DS poe $6.7 
ase Sian ies 


converges at x = 1. If this series were uniformly convergent in the 
open interval (—1, 1), then for any e > 0 we would have 

mik (—1)"x" 
a a 


for sufficiently large m, all k > 0, and all x such that -l <x <1. 


mt+k 1 7 
> 1 <e for m sufficiently large 
n 


<e 








Letting x - —1, we obtain 





and all k > 0. But this contradicts the divergence of the harmonic 
bon ee 
series Pe: 


It is immediately clear that one of the most important problems 
in connection with power series consists in determining the radius 
of convergence r from the given coefficients a,. As you prob- 
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ably know, the elementary criteria for convergence of numerical 
series with positive terms make the solution of this problem possi- 
ble under certain hypotheses. For example, if there exists 
lim. -27 |@,| = 4, then 7 = -, +00, or 0 according to whether 
Nw 

0</1< +0,/=0,or/ = +00 respectively. A more complete so- 
lution to the problem is given in the following theorem. 


THEOREM 14. Let / = lim YY]a,|. Then 


4 fO<1< +0, 


aa IEC if] = +00, 


The indicated upper limit (finite or infinite) exists for every 
series, and thus we have a solution of the given problem with- 
out any restrictive hypotheses on the coefficients. Before taking up 
the proof of the theorem, however, let us recall that if the number 
/ is the upper limit of a sequence, then for any positive e all 
the terms of that sequence are smaller than / + e from a certain 
point on; on the other hand, arbitrarily far in the given sequence 
we may find terms greater than / — e. 


Proof of Theorem 14. Let 0</]< +00. We set r= + and 


we must prove that the series (14) converges for 0< x <r and 
diverges for x > r. 

Consider 0 << x <r, so that /x < 1. Clearly, it is possible to 
find a positive number e so small that (/ + e)x < 1. From the defi- 
nition of /, it follows that for n sufficiently large we have 


Wlanl <2 +6, or lanl <@U+ 6)", 
and therefore 
Janfxr < [U0 + ex}. 


Since (/ + e)x < |, the terms of the series (14) for sufficiently large 
n do not exceed in their absolute value the corresponding terms of 
a convergent geometric progression. From this there follows the 
convergence of (14). 
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Now consider x >r, so that /x > 1. For a sufficiently small 
positive e we shall have (/ — e)x > 1. From the definition of / it fol- 
lows that there exist arbitrarily large values of n for which 
V |an| > 7 — e, from which it follows that |a,| >(/ — ©)", and 
therefore 


| Qn |x" > [1 — e)x]? > 1. 


Hence the series (14) contains infinitely many terms whose absolute 
value exceeds one. Therefore, the general term of the series cannot be 
infinitely small and the series is divergent. 

Let /=0. We have to prove that the series (14) converges 
for every positive value of x. From the definition of /, it follows that 
for n sufficiently we have the — 


Wan | Soe y aalie ss ~, and |a, |x" < + 








that is, the nth term of our series for 7 sufficiently large is smaller 
in absolute value than the nth term of a convergent geometric 
series. Therefore the series (14) converges. 

Let / = +00. We have to prove that the series diverges for 
every positive value of x. From the definition of / it follows that for 
n arbitrarily large there occurs the inequality 


Wan] >+ or |a_|x"> 1. 


Again, the general term of the series fails to approach zero as 
n— oc and the series is divergent. Thus the theorem is established. 
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5. The Derivative 


31. THE DERIVATIVE AND DERIVATES 


Up to now, we have been preoccupied either with auxiliary 
analytic constructions or with the analysis of fundamental con- 
cepts. We now pass to the central edifice of mathematical analysis, 
the theory of differentiation and integration. 

Let y = f(x) be defined in a neighborhood of the point x = a. 
If from that point we pass to the point a + h, the function y will 
acquire an increment f(a + h) — f(a). If we wish to get an idea of 
how fast y changes as x changes, that is, to what degree the func- 
tion f(x) is sensitive to a variation of its independent variable, we 
must compare in some manner the increment of y with the change 
hin x. For this purpose, it is most natural to consider the ratio 


which gives us the average change of y per unit of change of 
x. This computation must, however, be performed for a particular 
value of A and, in general, will give different results for different h. 
In order to obtain a unique solution for the given problem, we 
would have to agree to choose the quantity h according to 
some uniform rule. 

If our purpose is to investigate the behavior of f(x) in the im- 
mediate vicinity of the point a, then it is obvious that the smaller 
we choose |h| the better the quantity (1) will serve as a measure of 
the variability of f(x) at this point. For the quantity (1) character- 
izes the average variability of the function in the interval [a, « 4 Al, 
and the smaller the value of h, the more closely does this interval 
adhere to the point a. Since we are now well acquainted with the 
concept of passage to a limit, it is only a step further to the follow- 
ing realization. We shall obtain the best solution of our problem if 
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we choose as the characteristic of variability in the immediate 
neighborhood of a the limit 


’ ‘ (a + h) — f(a 
fe) = tim L049 = flo 


(assuming, of course, that this limit exists). 


DEFINITION. This limit is called the derivative of f(x) at the point 
a,oratx =a. 


Thus, the derivative of a function at a given point is used 
to characterize the relative rate of change of the function in the im- 
mediate neighborhood of that point: the greater the absolute value 
of f’(a), the more sensitive is f(x) to small variations of x from its 
initial value a. The sign of f’(a) characterizes the direction of the 
change: the quantity /’(a) is positive or negative depending upon 
whether f(x) increases or decreases when x acquires small positive 
increments to its initial value a. If we represent the function 
y = f(x) graphically, then the rate of change we are considering 
will be expressed by the steepness of ascent or descent of the curve 
traced as x passes through the value a. In more precise terms, the 
derivative is equal to the slope of the tangent to the curve y = f(x) 
atx =a. 

The derivative finds its simplest concrete interpretation when 
the independent variable x denotes time. The quantity (1) then rep- 
resents the average rate of change of y during the time interval 
[a, a + A], and the derivative /’(a) represents the rate of this change 
at the moment a. In particular, if y = f(x) denotes the distance 
traversed by a moving point in the time interval from a certain 
fixed moment a to the moment x, then the concept of the deriva- 
tive is identical to the concept of instantaneous velocity in 
mechanics. 

In the mathematical treatment of the natural sciences and in 
other applications of analysis, the role played by the derivative is 
of great importance. It describes the local behavior of a phenom- 
enon in a highly important respect: it measures the variability of 
one of two related quantities with respect to the other. 


1 Another generally accepted system of notation is: 


h = Ax, f(a + h) — f(a) = Ay, and f’(a) = lim ». 
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The whole computation, together with all the reasoning which 
we have applied to a single point « (arbitrarily chosen) in the inter- 
val [a, b], may be repeated for any point x in this interval 
(provided, of course, that in each case the limit in question exists). 
It is the function f’ (x) obtained in this way that is called the derived 
function of f. It should be clear that what we have just said does 
not in the least alter the fundamental fact that the derivative is a 
local characteristic of the given function, computed separately for 
each individual point, and used to describe the behavior of this 
function not in the whole interval but solely in the immediate 
neighborhood of its individual points. 


DEFINITION. We call a function f(x) differentiable at x = a if 
this function has a derivative at the point «; the function is differenti- 
able in the interval (a, b] if the derivative exists at each point interior 
to the interval. 


It is obvious that differentiability, like continuity, is a local 
property of the function. As you know, differentiability of a func- 
tion at x = a implies its continuity at this point. This is evident 
from the fact that, as the denominator of the fraction (1) tends to 
zero it is necessary, if the limit is to exist, that the numerator also 
tend to zero. This at once implies the continuity of f(x) at a. You 
no doubt also know that the converse is not true; a continuous 
function need not have a derivative. For example, the function 
(x) which is represented graphically in Figure 19 (for which there 





O a X 


Fig. 19 
is no need to give an analytical expression) is obviously continuous. 
Yet, at x = a the two limits for (1) as h-—> +0 and h > —0 are un- 
equal. (Geometrically, this is expressed by the fact that at 
x = a the curve does not have a unique tangent.) Consequently, no 
limit exists as h— 0. 
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The limits of (1) as h— +0 and h— —0, if they exist, are 
called, respectively, the right-hand and left-hand derivatives of f(x) 
at x = a. For a function to be differentiable at a point a, it is 
evidently necessary and sufficient that the right-hand and left-hand 
derivatives both exist and be equal. Figure 19 gives us the simplest 
example of the failure of a continuous function to have a deriva- 
tive at some point: however. both the right-hand and left-hand de- 
rivatives exist. It is natural to ask whether we always have such a 
situation, and it is easy to see that there are cases where the function 
is nondifferentiable in a much deeper sense. Figure 20 represents 





. Fig. 20 _ 


graphically the behavior of the function y = x sin = in the vicinity 


of x = 0. Since we have 





x in| < |x| 0 
x 
as x — 0, it follows (setting y = 0 at x = 0) that the function y is 


continuous at the point 0. As x tends to zero (say, from the right) 


de ts +oc and, consequently, sin | oscillates infinitely many times 
x x 
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between +1 and —1. As a result of this, x sin 7 oscillates infinitely 


many times between the lines y = x and y = —x. The expression 


(1), which in the present case is equal to sin * (fora = Oandh = 


x), has as partial limits all the numbers in the interval [—1, +1]; 
its upper limit is equal to +1, its lower limit to —1. And precisely 
the same thing happens as x — 0 from the left. Hence, we have in 
this case neither a right-hand nor a left-hand derivative at x = 0. 


DEFINITION. The derivates (or derived numbers) of the function 
at the given point are defined as the four limits: an upper limit and a 
lower limit of the expression (1) ash— +0 and as h—» —0. Each 
limit may be either a number or +o. 


Thus, every function has at each point (interior to a neighbor- 
hood in which it is defined) four derivates: right upper, right 
lower, left upper, and left lower. If the two right (left) derivates 
coincide, then the function has, at the given point, a right-hand 
(left-hand) derivative. When all four of the derivates are finite and 
equal (and only in this case), the function is differentiable at 
the given point. An example in which all four derivates are infinite 
is given for the point x = 0 by the function 


VII sin = if x £0, 
0 if x = 0, 


Y= 


as you can easily verify. 

If we consider that a function may have totally different types of 
differentiability at different points, we can see how complicated this 
aspect of a function’s behavior can become. Indeed, phenomena 
similar to those indicated in Figure 20 are not necessarily confined 
to individual isolated points: there exist continuous functions 
having equally complicated structure in the vicinity of every point 
(and which are, consequently, nowhere diderentiable). Unfortu- 
nately, the scope of these lectures does not permit us to dwell on 
any of the large number of examples of this kind which have been 
constructed. 

Instead, let us consider an example of the application of 
derivates. The reader knows, of course, that the behavior of differ- 
entiable functions is closely connected with the sign of the deriva- 
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tive. In particular, if for all the points in an interval [a, b] we have 
f(x) = 0 (or < 0), then the function f(x) is nondecreasing (or non- 
increasing) in this interval. These criteria certainly leave nothing to 
be desired where the given function has a derivative everywhere in 
the interval. But in the case of nondifferentiable functions they, of 
course, give us no information. On closer examination, however. we 
find that a criterion of this kind exists even for completely nondif- 
ferentiable functions. 


THEOREM 1. Let f(x) be continuous in the interval [a, b] and let 
one of the four derivates, we shall call it Df(x), be nonnegative for all 
x in this interval, then f(b) > f(a). 


It is obvious that this criterion is substantially stronger than the 
one given above, since it is applicable to any continuous, and 
in general nondifferentiable, function. 


Proof. Let us assume, contrary to the assertion of the theorem, 
that f(b) < f(a), and let e be a number such that La — 10) 


>e > 0; for definiteness, let Df(x) be the right upper derivate of 
F(x). We set g(x) = f(x) — f(a) + e(x — a): clearly, ¢(a) = 0 and 


¢(b) = (b — a) (e- fe) = =) LO=LO) <9 ‘“ (2) 


Let M be the set of points in [a, b] at which ¢(x) = 0, and let a 
be the l-u.b. of this set. If we had ¢(a) > 0 (or ¢(a) < 0), the point 
a would be contained in a neighborhood U where ¢(x) > 0 for all 
x in U (or, correspondingly, ¢(x) < 0 for all x in U). On the other 
hand, by the definition of least upper bound, every neighborhood 
of a contains points at which ¢(x) = 0. This contradiction shows 
that ¢(a) = 0. Now, since ¢(b) < 0, we clearly have ¢(x) < 0 for 
a <x <b,and for any such value of x we have 

Gx) — HA) — 9, 


x—a 
It follows that D¢(a) < 0. But 
Dg(a) = Df(@) + €. 
Df(a) = Dea) —€ <9, 


which contradicts our assumption. The theorem is thus proved. 


whence 
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32. THE DIFFERENTIAL 


The next fundamental notion in the theory of differentiation is 
that of the differential. Today we consider the differential a second- 
ary concept defined by means of the derivative, but it was not 
always so. At the period of inception of the infinitesimal calculus 
and for a long time thereafter, it was the differential which was 
considered the primary notion in analysis. The derivative was de- 
fined as the ratio of differentials, that is to say, as a secondary con- 
cept. In so doing, the notion of the differential was often left 
without a precise definition. It even contained within itself contra- 
dictory features, since, at that time, mathematical thought, though 
capable of grasping as a single object of thought a variable quantity 
in its process of change, had not yet attained sufficient development. 

You know, of course, the formal definition of the differential: 


DEFINITION. The differential of y = f(x) is the quantity 
dy = f(x) Ax 3) 


where Ax is the increment of the independent variable. 


The differential of f(x) is thus a function of two variables, the 
quantity x and its increment Ax. The values of these two variables 
are not connected in any way and can be chosen independently of 
each other. 

Taking, in particular, the function y = x we see that dx = Ax, 
i.e., for the independent variable the differential and the increment 
always coincide. Substituting dx for Ax in (3), we find that 


dy = f'(x) dx. 
This gives 


ne 6) ee 
yaf@=2, 


that is, the derivative is equal to the ratio of the differential of the 
function and the differential of the independent variable. 
However, all these purely formal considerations fail to bring 
out the tremendous significance which the notion of the differential 
has for analysis and its applications. To get a better understanding 
of this matter, it is necessary to look more closely into the essence 
of this notion. Perhaps it will be most enlightening if we begin by 
considering the special case in which x denotes time and y = f(x) 
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denotes the distance traversed by a moving point in the time inter- 
val from 0 to x. In this case, as we know, the derivative J’(x) signi- 
fies the instantaneous velocity of the moving point at the moment 
x. It follows that dy = f’(x) Ax is the length of the path which the 
moving point would cover in the time interval [x, x + Ax] if dur- 
ing that time it moved uniformly and with the same velocity it had 
at the moment x. The actual distance covered during that time in- 
terval will, in general, be different, since the velocity does not 
remain constant. 

In general, the derivative f’(x) is considered as a measure of the 
relative variability of the quantities y and x at a given value of x. 
The differential dy = f’(x) Ax can, therefore, be considered as the 
increment which y would receive by changing x to the value 
x + Ax, if, at all points of the interval [x, x + Av], the relative 
variability continued to be the same as at the point x. This idea is 
represented graphically in Figure 21: dy is the increment which the 





O x x+Axn x 
Fig. 21 


ordinate of the curve y = f(x) would have received between x and 
x + Ax if the slope of the curve in that whole interval were 
the same as at the point x. In other words, if we replaced the curve 
by the tangent at the point whose abscissa is equal to x. 

Many of you undoubtedly know that in the applied sciences 
we frequently do not distinguish between the increment Ay and the 
differential dy of a function for small values of Ax. Sometimes this 
is the source of a wrong and harmful idea that the differential is 
an infinitely small increment, when, in reality, the differential is 
neither an increment nor an infinitely small quantity. 
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It will be useful for us to try to answer two questions which 
naturally arise in connection with this substitution of the differen- 
tial for the increment: one, to what degree is such a substitu- 
tion justified? and two, what advantage might it offer? To answer 
the first question, we shall start with the relation 


f'(%) = lim 22 


Ar— aiee 


Let us denote by a the difference ~ — f(x), which by virtue 
of the above relationship is infinitesimal as Ax — 0. This gives 

Ay = f(x) Ax + adx = dy + adx. (3’) 
Since a—>0 as Ax-—>0, the product aAx is an infinitesimal of 


a higher order than Ax. But, for a fixed x, the ratio 2 = f(x) isa 


constant quantity, and the ratio a as Ax — 0 has this constant as 


its limit. Consequently, provided only that f’(x) 0, all three 
quantities Ax, Ay, and dy are infinitesimals of the same order. 
Therefore, aAx, being an infinitesimal of higher order than Ax, is 
at the same time an infinitesimal of higher order than either of the 
quantities Ay and dy. Thus the relation (3’) shows that if f’(x) + 0, 
the difference between the increment and the differential of the func- 
tion as Ax — 0 is an infinitesimal of higher order than either of these 
quantities by itself. In other words, by replacing the increment with 
the differential (or vice versa) we incur only an infinitely small 
relative error. 

The relation (3’) is obviously equivalent, when f(x) $4 0, to the 
relation 


Ay a 
—=li-z—., 
dy f'(*) 


from which it follows immediately that 21 as Ax — 0, and, 


thus, Ay and dy are equivalent infinitesimals. These results justify 
the possibility of replacing a small increment Ay of a function by 
the differential of the function, as an approximation to the 
increment. 
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To answer the second question, what advantage is obtained by 
replacing the increment of a function by the differential, we 
observe that the computation of the differential is theoretically 
simpler and in practice more convenient than the computation of 
the increment. The differential dy is a linear function of the quan- 
tity Ax, the character of its variation as Ax changes is exceptionally 
simple, and its use requires nothing more than the computation of 
f'(x) at a single point. Clearly, nothing like this holds for the quan- 
tity Ay. Imagine, for example, that we want to construct a table of 
the values of the function y = sin x for values very close to 60°, e.g., 
60°01’, 60°02’, and so on, but that we have no means for the pre- 
cise computation of these quantities. That is, we know that for 


x = 60° we have y = ve but, on passing to other, nearby, values of 


x, we do not have a method of finding the corresponding in- 
crements Ay of the quantity y. Using the fact that the increments 
Ax in this problem are small, let us replace the increments Ay by 


the differentials dy. Since y’ = cos 60° = > we have dy = Fax 


where Ax is expressed in radian measure radians), 


("" = aaxeor 
~ (180)(60) 
From this we obtain at once 


Sains MD 1 7 
sin 60°01’ = 5 + 5 "(180)(60) 
: Sep, ATS. I 2a 
sin 60°02’ = eae + 5) “(180)(60) 
; emai ack 37 
sin 60°03’ = a + 5} (180)(60)_ 


As we have seen, the differential dy has the following two 
remarkable properties: one, it is a linear function of Ax; and two, 
it differs from Ay by an infinitesimal of a higher order than Ax. We 
shall now show that these two properties completely characterize 
the differential, so that one could begin the study of the differential 
by defining it as a function of x and Ax having these two properties. 

For this purpose, let dy = aAx + b, where a and 5b are con- 
stants independent of Ax, and let Ay — dy = adx, where a0 
as Ax — 0. Hence, Ay = dy + aAx = (a + a) Ax + 5. If the func- 
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tion y is continuous (which we assume), then Ay—>0 as Ax-0, 
whence, necessarily, b = 0 and Ay = (a + a) Ax. Accordingly, 


Ay _ Ay 3 
—~“-at+a, lim—=f() =a, and dy= f(x) &. 
Ay tate him H/C) y = f°) 

One of the important properties of the differential (though 
many textbooks do not stress it sufficiently) is its so-called invari- 
ance with respect to a transformation of the independent variable: 
If x is an independent variable then, as we have seen, 


dy = f(x) Ax = f(x) dx. (4) 


Let us now consider x as a function x = g(t) of a new variable 1. 
It is clear that now both of the relationships (4) cannot simultane- 
ously be valid, since now (in general) 


dx = g(t) At Ax. 
It is worth noting that the second relation, that is, 


dy = f(x) dx, 


remains valid after any such change of the independent variable. 
For, as a result of such a change, y becomes a function of 1: 


y= 9 = fle) 


Using the rule for differentiation of composite functions, we have 


WO) = fTe)] oO, 
dy = W(t) dt = f19O] e' at, 


and from g(t) = x and g(r) dt = dx it follows that dy = f’(x) dx, 
precisely as if x were the independent variable. Hence, it is imma- 
terial whether the variable x in the expression for the derivative 


y= # is independent or is a function of another variable. 

We have just employed the rule for the differentiation of a com- 
posite function. Generally speaking, we do not intend in these lec- 
tures to discuss elementary rules for differentiation, but this 
particular rule deserves attention. In the majority of textbooks, its 
proof is either incorrect or unnecessarily complicated. Here is a 
simple yet rigorous proof:! 


1Communicated by M. A. Kreines. 
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Let y = f(x), x = g(r). and y = Wr) = f[¢(r)]. We assume. of 
course, that the functions fand ¢ are differentiable. so that in (3’) 
we have a > 0 as \v > 0. It follows from (3’) that 


Aj (eke Ax 
ap Se ag 


But as \t—0. we have Av-—0 and. consequently. a—0, 


Ax AS ; : — 
areas g(r). and oo v(t). Passing to the limit. we have 


YOS=fOo Cw) =fieOl¢e'@, 


which is precisely the rule that we wished to derive. 


33. LAGRANGE’S THEOREM (FIRST MEAN VALUE THEOREM) 


The construction of the differential calculus is based. to a con- 
siderable degree, upon Lagrange’s theorem. It represents the first 
example of the so-called mean value theorems, to which, in view of 
their exceptional importance in all fields of analvsis. we shall have 
to give considerable attention. 


THEOREM 2 (Lagrange). If f(x) is continuous on the closed inter- 
val {a, b] and differentiable within that interval, then there exists 
a point c,a<¢ <b, such that 


f(o= fo) — LO) — Sa) 


Proof. It is obvious that the function 


¢(x) = (6 — al f%) — f(@) — & — af) -— f(a) 


is differentiable within the interval [a, b]. Since ¢(a) = ¢(b) = 0. 
either g(x) = 0 for all x € [a, 5] whereupon we have ¢’(x) = 0 
identically, or else ¢(x) assumes, within the interval, values differ- 
ent from zero. Among these values there exists either a maximum 
or a minimum (since the function q(x) 1s continuous). Let us sup- 
pose, for definiteness, that the function ¢(x) attains its maximum 
value at x = c, where a<c <b. Necessarily. ¢’(c) = 0. for if 
¢’(c) > 0 we would have, for sufficiently small positive /. 


He+ NO 5 9 
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Consequently, g(c + A) > y(c) contrary to our definition of the 
point c. If g’(c) < 0, we obtain a similar contradiction by assigning 
negative values to h. From q’(c) = 0 the desired relation follows, 
and the theorem is proved. 


Frequently we formulate this theorem using another system of 
notation, letting the end points of the given interval be x and 
x + Ax, where Ax > 0. Then we speak of the existence of a num- 
ber 6 (0 < 8 < 1) such that 


Si(x + 6 Ax) Ax = f(x + Ax) — f(x). 


For functions having derivatives of higher order, there exists an 
important generalization of Lagrange’s theorem: 


THEOREM 3. If y = f(x) has a continuous derivative of order 
n — 1 throughout the closed interval |x, x + n Ax] and has, within 
this interval, a derivative of order n,1 then there exists a num- 
ber 6 (0 < 6 < 1) such that 


Ary = fx + On Ax)(Ax)". 


The symbol Ay denotes here the so-called nth order difference 
of y = f(x). The first order difference is simply the increment 
Ay = f(x + Ax) — f(x). We define the difference of the (7 + 1)th 
order as the first order difference of the nth order difference, 
so that 


A2y = A(Ay) = [f(x + Ax + Ax) — f(x + Ax)] 
— [f( + Ax) — f(x)] = fe + 2Ax) — 2f(x + Ax) + fOr), 
A8y = f(x + 3Ax) — 3f(x + 2Ax) + 3f(x + Ax) — f(x), 


and so on. In general, as is easy to prove by mathematical induction, 
Ary = f(x + n Ax) — nf[x + (n — 1) Ax] 
+ Me Aix + — 2) Ax} ---- 


+ (— 1)? Inf(x + Ax) + (— If). 


Proof. For n = 1, the theorem to be proved coincides with 
Lagrange’s theorem. Let us now assume that the theorem is true for 
differences of order n. We then show it is true for differences of 


1 Atthe end points of the interval it suffices that there exist one-sided derivatives of the 
ordern — 1. 
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order n + 1. Thus we assume that y = f(x) has a continuous 
derivative of the mth order throughout the closed interval 
[x, x + (n + 1) Ax] and a derivative of order n + 1 in its interior. 
Consequently, the function y = f(x + Ax) — f(x) has a derivative 
of the nth order throughout the interval [x, x + n Ax]. We can 
therefore apply our inductive hypothesis to its nth order differ- 
ence, obtaining 


An(Ay) = (Ax){ fm x + Ax + On Ax) — f™x + On Ax)}, 


where 0 < 6; < 1. But it follows from our assumptions that f(x) 
is continuous on the closed interval [x + 0;n Ax, x + (1 + 04n) Ax] 
and differentiable in its interior. Therefore, applying Lagrange’s 
theorem to the expression in braces in the above equality, we find 


Antly = (Ax)**1ferDx + On Ax + O2 Ax) 
= (Ax) t1fintIyx + 6 [n + 1] Ax), g.e.d. 


Another important generalization of Lagrange’s theorem is 
contained in the remarkable Cauchy’s formula: 


CAUCHY’S FORMULA. Given two functions f,(x) and fx(x) con- 
tinuous on the closed interval [a, b] and differentiable in its in- 
terior, and supposing that fo'(x) never assumes the value zero inside 
the interval, then there exists a point c in the interior of [a, b] 


such that 
Alb) = fil) _ filo) 
So(b) — fola) fae) 


The proof of this general formula (which for fo(x) = x reduces 
to Lagrange’s theorem) is carried out in the same manner as that 
of Lagrange’s theorem. 


Proof. Setting (x) = fi fold) — fala) — fli) — A), 
we find that g(a) = ¢(b). Hence, the function g(x) is either con- 
stant on [a, 5] or attains in the interior of this interval a maximum 
or minimum value. Therefore, by precisely the same reasoning as 
before, we conclude that, at some point c interior to [a, 5], we 
must have g’(c) = 0, from which the desired formula follows. 





Let us further note that the left side of Cauchy’s formula can- 
not turn out to be meaningless, since if f(a) = fo(b), then, by 
virtue of Theorem 2, we would have /’(x) = 0 at some interior 
point of [a, b]; this is excluded by the hypothesis of the theorem. 
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34. DERIVATIVES AND DIFFERENTIALS OF HIGHER ORDER 


Derivatives and differentials of higher order are defined, as you 
know, by induction: the derivative of order n is obtained by differ- 
entiating thé derivative of order n — 1, and similarly for differentials. 

The higher order derivatives and differentials of a function 
y = f(x) are connected by relations of the form 


dry = fox) dx (5) 


(where d"y denotes the differential of d”~1y and dx” means the nth 
power of the first differential of the independent variable), which 
are also proved by induction. For if x is the independent variable, 
then dx = Ax does not depend on x and, therefore, by differentiat- 
ing the relation (5) we find (the prime mark ’ indicating differentia- 
tion with respect to x): 


dntly = d(d*y) = (d"y)y' dx = f(x) dx"), 


As we have seen earlier, for n = | the relation (5) remains true 
when x (and hence y also) is a differentiable function of an inde- 
pendent variable t: x = g(t). It is easy to see that for n = 2 
the situation is already different: differentials of higher order are 
not invariant with respect to a transformation of the independent 
variable. Denoting the second differential of y = f(x) by (d2y); or 
(d?y),, depending on the choice of x or ¢ as the independent vari- 
able, we have 

(d?x), = f(x) dx?, 
while 
2 = d*f[ p(t ») 2 
(d y)e = — ae 2 dt 


= d{ filet  e'()} dt2 
dt 


{P' Tot] et) + fT e@)] @’()} at? 
= f(x) ax? + f Te] eC) at?. 


We see that these two expressions are different: the second one 
contains an additional term /’[p(t)] »’(t) dr?. This term is absent 
in the first expression and is identically zero only if x is a 
linear function of the variable /, g(t) = at + b. Hence the second 
differential, in contrast to the first, remains invariant only with 
respect to linear transformations of the independent variable. 
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It is also possible to define derivatives of higher order in 
another way, connecting them with differences of correspond- 
ing order. It is possible to show that 





lim ay ei (6) 


This is similar to the method used in defining the first derivative as 





tinuous, then (6) follows directly from Theorem 3, but it is impor- 
tant to prove that the validity of (6) does not depend upon this 
assumption. 

Since (6) is valid for n= 1, it is only necessary to show 
that, assuming this relation is valid for n, it must also be valid for 
n + 1 (presupposing, of course, that y = f(x) has a derivative 
of order n + 1). Let us then assume the relation (6) for a given n 
(and for any function y differentiable n times). Since the function 
Ay = f(x + Ax) — f(x) has a derivative of order n, we may apply 
Theorem 3 to it, obtaining (as we have already seen on p. 115) 


An(Ay) = (Ax)"(AV) Sonar 
= (Ax)"{ f(x + On Ax + Ax) — f(x + On Ax)}, (7) 


where 0 <6 < 1. But because of the existence of the derivative 
fer D(x), we have 


fx + On Ax + Ax) — fx) = (1 + nO) Axf f(x) + a1} 
and 
fx + On Ax) — f(x) = nO Ax {fF VMx) + a2}, 


where a; — 0 and ag > 0 as Ax — 0. Consequently, 


fx + On Ax + Ax) — fx + On Ax) 
= Axf@*D(x) + a1(1 + 10) Ax — agné Ax. 


The relation (7) thus gives us 





Aly - 4) = 
(ax = fr W(x) + ay(1 + 0) — and, 
and letting Ax — 0, we obtain 
+1); 
lim 2 = ferry), 


\ro0 (Ax)"*1 
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35. LIMITS OF RATIOS OF INFINITELY SMALL 
AND INFINITELY LARGE QUANTITIES 


From among the applications of the formulas of Lagrange and 
Cauchy let us first of all consider the important problem which 
often appears in courses in analysis under the meaningless title of 
the evaluation of indeterminate expressions. The reference here is to 
the application of the methods of differential calculus to comput- 
ing the limits of ratios of two infinitely small or two infinitely large 
quantities. 

Let us assume that fi(a) = fo(a) = 0, that in some neighbor- 
hood of the point a both functions are differentiable, and that 
fo'(x) ~ 0 for x # ain some neighborhood of a. Since for f{(a) = 
f-(a) = 0 the Cauchy formula gives 


filath) _ fila+ 6h) 
ati Te uy (0 <8< 1), 


we can State the following proposition. 


L’HOSPITAL’S RULE. If fi(a) = fo(a) = 0 and the ratio aot 
‘ 2 


tends to a limit as x > a, then the ratio ae tends to the same limit. 
AX 





If f(a) = fo’(a) = 0, then applying this rule to the ratio 


x 
ne ) (and assuming, of course, the existence of the second deriva- 
2’( 


tives of the functions f,(x) and f2(x) in some neighborhood of a), 
we have: if fi(a) = fo(a) = fi'(@) = fe’(a) = 0, then from the rela- 


tion lim wa x) = / it follows that we also have lim f a= = /, And, 
AX 

in genial if es = fo(a) = fi'(@) = fk @ =---= fi V@ = 

fo" (a) = 0, and if the functions fi(x) and f(x) have derivatives 

of order n in some neighborhood of a, then from the relation! 


AM) 
LAX) 
1 This implies that f2™(x) ~ 0 for x a in some neighborhood of a. It then follows by 


Lagrange’s formula that the same thing also holds for fo"— (x), fo"-2(x), ..., fo’), 
allowing us to apply repeatedly Cauchy’s formula and thus prove that 


Aw) 3! 
Sx) 











lasx-a 
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it follows that 
fil) 
F2lx) 


Thus if fi™(x) and fo™(x) are continuous at a and if f(a) 40, 
then 


> lasx—- a. 





m AG) _ ~ A @ (8) 
ra folx) faa)” 

L’Hospital’s rule serves as a very efficient tool for comput- 
ing limits, and in many cases allows us to find these limits 
very easily when application of the elementary methods would in- 
volve great difficulties. For example, if f(x) = tan x — sin x and 


J2(x) = x3, we have f1(0) = fr’(0) = fi’”(0) = 0, fi’”’(0) = 3, (0) = 
fo'(0) = fo’"(0) = 0, and f2’’(0) = 6. By (8), we obtain at once 


_ tanx — sinx f’’"(0) ] 
l Oe FS —————_ = —, 
un x3 fo’”(0) ? 


L’Hospital’s rule remains valid even when a = +oo. For 


A (+ 
lim 4 ~ : ‘ 
J 


and by virtue of what we have already proved, the last limit 
coincides with 








ae (+) 
lim i : ps er fi’) 
eae A (+) tote fo'(x) ’ 
as a 
if this limit exists. Of course, a necessary prerequisite here for the 
validity of the rule is the condition 


dim fi) = = jim fax) = = 0. 





Moreover, L’Hospital’s rule can also be applied to finding the 
limit of a ratio of infinitely large quantities, though here the proof 
is somewhat more complicated. Suppose, to be definite, that 
fix) > +0 and fox) ~ +00 as x >a. In addition, we assume, as 
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before, that fo’(x) never vanishes in some neighborhood of a. Let 
us take in this neighborhood two arbitrary points x and a such 
a x f a 
OO 
Fig. 22 


that x lies between a and a, and suppose, for definiteness, that 
a<x <a (Fig. 22). By Cauchy’s formula, we have 


Ai) — fala) = A'® (9) 
Aro) —-fle® fr” 
where & is some interior point of the interval [x, a]. Since, on 
the other hand, 
eee fila) 
fie) = fila) _ fi@), fix) 
fax) —fola) fol), _ fala) 

















fax) 
the relation (9) gives us 
iy Sola) 
fue) _ fl, fol) ae 
fo) KO | _ fia) 
Ai) 


We shall now be more specific about the choice of the points x 
and a. If, as we assume, 
lim Jt) 
ra fe’) 
exists, then, for any arbitrarily small e > 0, there is a neighborhood 
U of a such that for x € U we have 





za geer il OO) ; 
“ f2'(x) pee 


Let a (and thus x also) belong to this neighborhood. Then é 
also belongs to U, and, therefore, 


A® 
S2'() 


Let us now hold a fixed and let x approach a. Since, in this case, 
fix) > +00 and fo(x) > + 00, the second factor on the right side 


foe <lt+e. 
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of (10) tends to unity. Therefore, for all x in a neighborhood V of 
the point a this factor will be contained between | — e and 1 + «. 
Thus, for all x € V formula (10) gives 


@-Kl—a< fa. LUPHIR®, 


and since e is arbitrarily small, we have 


This rule, like the rule for the ratio of two infinitely small 
quantities, also remains valid when a = +oo. 











EXAMPLES: 
(a) fi(x) = In x and fo(x) = —. If x > 0, then f(x) ~ — 0 and 
JOO) a, i 
fax) +00. Sar a x — 0, whence 
AG) . 
~~ =xlInx>50 as x— 0. 
f{X) 
(b) Suppose fo(x) = x. Then, as x > 00, we have fi(x) = 
Inx > +06 and as = ==30. Hence, 
fa'(x 
0. nx 9 as X— 00. 


Substituting x = e¥ and replacing y again by x, we find xe~* — 0 
asx > o. 


36. TAYLOR’S FORMULA 


We shall now take up Taylor’s formula which, as you un- 
doubtedly know, is one of the most important tools in analysis, as 
well as in its applications. This formula, although it is discussed in 
detail in all courses in analysis, nevertheless requires our attention. 
The derivation of the formula, as it is usually presented, is unin- 
spired and formal, leaving its essential content completely unillumi- 
nated. Thus, the importance which the formula later acquires is 
unexpected and sometimes remains a riddle forever to many 
students. 
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As we have already stressed more than once, the basic relation 


ig SEO FO 


may be written in the equivalent form 


fla +h) = f(a) + hf'(a) + ah, (11) 
where a —> 0 as h-> 0. Consequently, ah is here an infinitesimal of 
higher order than A, that is, a quantity whose ratio to A tends 
to zero as h—.0. 

In general, let us agree to denote by o(x) any quantity whose ratio 
to the quantity x tends to zero during a given process of change. 
We then may write the product ah in the form o(h). And since, by 
our definition, we have for any o(h) 

o(h) 
ae, 


=a—0,ash—0O, 


we also have the converse: every o(h) may be represented in the form 
ah where a — 0 as h > 0. Consequently, the relation (11) can be 
written in the form 


f(a + h) = f@) + hf'@ + off). 


As we see, this formula is valid in all cases where f’(a) exists. We 
have already had occasion to apply it more than once, and the rea- 
son for its usefulness is immediately clear: If h is so small that we 
can neglect quantities of the form o(h), then the approximation 
formula 


fa +h) =fla) + hf'@ 


permits us to replace the complicated function f(a + h) with a 
linear function. The substantial advantage which this gives us 
is obvious. 

It will now be readily understood why we might desire to 
extend this useful procedure somewhat further. If we wish to ob- 
tain greater precision, it is sometimes impossible to be satisfied 
with approximate formulas whose error is small only in compari- 
son with h. For example, we may have to take into consideration 
quantities of order h?, so that we can neglect only quantities of the 
form o(h?), that is, quantities infinitesimal in comparison with h?. 
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Then, of course, we must set ourselves to the problem of finding a 
polynomial of the second degree ap + ayh + agh? such that, for 
small values of h, we shall have the relation 


f(a +h) = ao + ayh + agh + o(h?). 


In general, if we want to take into consideration quantities of the 
order h”, but agree to neglect quantities of the form o(h”), then we 
shall, of course, look for a polynomial ap + ayh +--+ + anh™ = 
P,,(h) which will satisfy the relationship 


f(a +h) = P,(h) + o(h”). 


If we succeed in solving this problem, we shall then be able to use 
a simple polynomial of degree n, instead of the (in general) com- 
plicated function f(a + A) in all problems where quantities of the 
form o(h") may be neglected. 

For n= 1 we have already found the solution, assuming 
that the function f(x) has a first derivative at x = a. In the general 
case we Shall, of course, assume that /(™(a) exists. This obviously 
requires that f(x) have derivatives of all preceding orders in some 
neighborhood of the point a. 

The celebrated Taylor’s theorem gives us at once the solution 
to the problem we have just described. That is, it proves the exist- 
ence of the desired polynomial P,(h) and gives the formula for its 
coefficients. 


TAYLOR’S THEOREM. When f(a) exists the polynomial P,(h) is 
uniquely! defined by the formula 


P,(h) = f(a) + Af(a) + = fa) tee + in frn(a), 
In other words, 
ath) =f@+hf'@+Zf'a+---+foa)+ ol). (12) 


1The uniqueness of P,() follows immediately from the relation f(a +h) = P,(h) + 
o(h"). For, if two such polynomials P,(h) = > a;,h* and Q,(h) = > b,h*, it would fol- 
low that 3 (ae — b,)h* = o(h"). Dividing a by A and letting h — 0, we ob- 


tain successively do — bb = 0,a,; — 6; = 0,7..., — b, = 0. 
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Proof. Let us set 
f(a +h) — Prth) = 9A); 


then, to prove our assertion, it will suffice to show that 


a) jas hes 0. (12’) 


For this purpose we note that 
g(h) = fla + h) — fla) — hf'(@) — + ee 


YM) =a +h) = f'@) = Hf'@ ~- -— FIM) 


gh) = for-2(a + h) — feo-2(a) — hfe-M(a) — Zfom(ay, 
gorM(h) = fO-M(a + h) — fo-V(a) — hf (a), 


from which 


70) = 90) =--- = pO) = 0. 
Since the derivatives of the function A” up to (and includ- 


ing) order n — 2 also take the value zero for h = 0 and the deriva- 
tive of order n — | is equal to nth, we have, by L’Hospital’s rule, 


PA) _ im OOM 


ho =n h-0 nth 


assuming that the limit on the right side exists. But 


face Og i — sa)| = 0), 


h0 nth ni te 


since f™(a), by our assumption, exists. This proves relation 
(12’) and the derivation of Taylor’s formula (12) is completed. 


As we proceed further, the question arises of finding convenient 
expressions for the remainder term o(h") in formula (12), so 
that, when the need arises, we may estimate it as accurately as 
possible. We shall present here the most frequently used forms of 
such expressions, without giving their proofs, which can be found 
in any unabridged text in analysis. 
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Let us denote by R,(h) the quantity which, in formula (12), was 
loosely denoted by o(h"), and which we have also called (i). On 
the mere assumption that f("'*)(a) exists, it can be shown that 


tim Ra) _ f(a) | 
hao Arti (n+ 1)! 





If we assume that f(*)(x) exists, not only at x = a, but through- 
out the interval [a, a + h], we can obtain for R,(A) an expression 
known as Schlémilch’s form of the remainder: 


Antu] — A) nt1—p 
ep ore + Oh), 


where 0<64< 1 and p is any positive number not exceeding 
n+ |. From this very general formula we obtain, by particular 
choice of the number p, a number of special forms of the remainder. 
Those used most frequently are: 

(a) For p =n + I, we obtain Lagrange’s form of the remiainder 


jyntl 
Cae ee 


(b) For p = 1, we obtain Cauchy’s form of the remainder, 


hn+1(] ~— Gyr 
a forD(a + Oh). 


I 


R,(h) = 


R,(h) = 


R,(h) = 


In these individual forms, as well as in the general form, we assume 
that f"*1(x) exists throughout the interval [a, a + A). 

As you already know, the name Maclaurin’s formula designates 
the (in no way particularly remarkable) special case of Taylor’s 
formula for a = 0: 


fh) =f) + Af'O +--+ foo) + oh. 


37. MAXIMA AND MINIMA 


An important application of Taylor’s formula in elementary 
differential calculus is the theory of maxima and minima, the 
fundamentals of which are already known to you. Suppose we are 
looking for a point in the interval [a, b] at which y = f(x), differ- 
entiable.in this interval, attains its greatest value. This absolute 
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maximum may be attained either at an end point of the interval or 
at an interior point c. In the latter case, the point c is also a point 
at which the function attains a relative maximum, which means 
that f(c) > f(x) for all x in a neighborhood of c. Thus, the search 
for the maximum of a function in a given interval is reduced 
to finding all its relative maxima in that interval. And it is to this 
problem that we apply the methods of differential calculus. 

You know, of course, that a necessary condition for a differenti- 
able function f(x) to reach a relative maximum (or minimum) at 
the pointx (a< x < 5) is the relation 


fx) = 0. (13) 


Thus, the first step toward the solution of the problem in question 
consists in finding all the real roots of the equation (13), the 
so-called critical values of x. After this is done, each critical value 
must be tested individually to establish whether or not the function 
attains a maximum or minimum there. We shall turn our attention 
momentarily to this testing process. 

Let a be one of the critical values of f(x), that is, leta<a<b 
and f’(a) = 0. And let us assume that f(a) = 0 for 1 <i<n, but 
that f(a) 0. In such a case, Taylor’s formula (12) gives 


(cD FOe Le) + oh), (14) 


It is clear that the sign of the right-hand side of (14), whose 
second term is infinitesimal in comparison to the first, coincides 
with the sign of this first term for sufficiently small |A|, that 
is, coincides with the sign of the product hf (a). If n is odd, then 
h", and hence the whole product, changes sign when / (which may 
be positive or negative) changes sign. In this case the sign of the dif- 
ference f(a + /) — f(a) will also, by virtue of (14), change sign 
as the increment / changes sign. This clearly implies that f(x) has 
neither a maximum nor minimum at x = a. Now suppose that 7 is 
even, so that A" >0 for any h~ 0. The sign of the difference 
f(a +h) — f(a) for all sufficiently small || clearly coincides with 
the sign of f(a), which does not depend on fh. If f(a) > 0, then 
for all sufficiently small |h| we have 


f(a +h) > f(a), 
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that is, f(x) has a relative minimum at x = a. And similarly, if 
fa) <0, then f(x) has a relative maximum at x =a. We 
thus arrive at the following rule: 


Let n be the order of the first nonvanishing derivative of f(x) 
atx =a. Ifn is even, f(x) has a maximum or a minimum at x = a 
according to whether f™(a) < 0 or f(a) > 0 respectively. But ifn 
is odd, then f(x) has neither a maximum nor a minimum at x = a. 


We can see that this rule solves completely the problem of de- 
termining the nature of each critical point, provided only that the 
function is differentiable at this point a sufficient number of times 
and that not all of its derivatives are equal to zero at this point. 


38. PARTIAL DERIVATIVES 


We must now direct our attention to functions of several vari- 
ables. But for simplicity we shall restrict ourselves to functions of 
only two independent variables x and y. 


As you know, the partial derivative f;'(x, y) or HY) of f(x, y) 
with respect to x is defined as 


lim L(x aE ay) — f(x, y) : 


Ar—0 


that is, as the derivative in the ordinary sense of the function of x 
obtained from f(x, y) by assigning to y a constant value. z = 
Sy(x, y) is defined analogously. And each of these partial deriv- 
atives is, in turn, a function of x and y. For convenience and 
clarity, we shall frequently replace the words at x = aandy =b 
by the expression at the point (a, b), meaning the point in the (x, y) 
plane having the coordinates x = a and y = b. 

First of all, we must define the concept of differentiability for a 
function z = f(x, y) at a point (a, b). For a function of one varia- 
ble, the existence of a derivative was equivalent to the existence of 
a differential. And, as we have seen, the differential dy could be de- 
fined for y = f(x) as a linear function of the increment Ax which 
differs from Ay by an infinitesimal of higher order as Ax — 0. 
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Let us now consider the situation for a function of two variables 
z = f(x, y). Here again, we agree to define a differential dz of the 
function z as a linear combination A Ax + B Ay + C of the incre- 
ments Ax and Ay which, as Ax > 0 and Ay > 0, differs from Az by 
an infinitesimal of higher order. In the one-dimensional case, 
we could (assuming f’(x) 0) accept as the fundamental infinites- 
imal any one of the quantities Ax, Ay, or dy (their order of small- 
ness being identical). In the two-dimensional case, it is convenient 
(though by no means necessary) to take as the fundamental 
infinitesimal the quantity p = \/Ax? + Ay’, which expresses the 
distance of a displaced point (x + Ax, y + Ay) from the initial point 
(x, y). If Ax and Ay are infinitesimals of the same order, then 
the quantity p is clearly of this same order. 

Thus, the expression A Ax + B Ay + Cis a differential dz of a 
function z = f(x, y) if, as Ax > 0 and Ay — 0, we have the relation 


dz — (A Ax + BAy + C) = ofp). (15) 


We shall now show that if dz exists, then the partial derivatives 





OZ and Oz also exist and A = 02 B= oz , and C = 0, so that 
Ox oy Ox cy 
dz = 2 ay 4% ay. (16) 








Ox oy 


First of all, because of the continuity of the function z (which we 
assume, of course) the relation (15), by a passage to the limit, gives 
C = 0. Having established this, let us set Ay = 0 in formula (15). 
According to our assumption, this formula must be valid for any 
way in which the quantities Ax and Ay tend to zero. This gives 
(Az),y-0 = A Ax + o(Ax), and hence, 


OZ 
ey 


is, therefore, a consequence of 


. The existence of the 





Similarly, we may establish that B = 


OZ 
oy 
the existence of the differential dz. 
So far everything runs in complete analogy with the one- 
dimensional case. Unlike the one-dimensional case, however, the 


partial derivatives and 








OZ 
0 
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existence at a given point of the partial derivatives 22 and 2 

jy 
does not guarantee the existence of the differential dz. For exam- 
ple, considering the function 





2xy 2 
Br Very if x2 + y2A~0, 
0 x=) =O, 


at (0, 0), we obtain 
(AZ) ie=o = (AZ),y20-= 0, 
and, hence, 


Oz _ 0 
ax ay 





If the differential dz existed at x = y = 0 then, by (16), it would be 
identically zero, so that, from (15), we would obtain Az = o(p). But 
this is contradictory, since for Ax = Ay #0, we have 


Az = \/2 Ax, p = \/2 Ax, and Az = p. 


: : Ao as 0 
Let us observe, however, that if the partial derivatives Zz and 


= are continuous at the point (x, y), then the expression (16) is the 
differential of z = f(x, y) at that point. For, 


Az = f(x + Ax, y + Ay) — f, y) 
= [f(x + Ax, y + Ay) — f(x, y + Ay)] + If »y + Ay) — f0z y)). 


The first difference on the right-hand side has, by Lagrange’s 
mean value theorem, the form 


Axfr'(x + 8 Ax, y + Ay) O<8< 1). 


From the assumed continuity of f2’(x, y) at the point (x, y), it fol- 
lows that, as Ax — 0 and Ay > 0, we have 


fo'(x + 8 Ax, y + Ay) — fe'(x, ¥) > 0. 


Hence, the first of the two differences mentioned above differs 
from f;’(x, y) Ax by a quantity of the form o(Ax) = o(p): 


f(x + Ax, y + Ay) — fen y + Ay) = fa’ y) Ax + 0(0). 
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On the other hand, from the very definition of partial derivatives, 
we have 


f(x,y + Ay) — f(x, y) = fi’ y) Ay + o(Ay) = fr'(x, y) Ay + 0(0). 
It follows that 


Az = fr'(x, y) Ax + fy’(x, y) Ay + 0(p), 


which is exactly what we set out to prove. Our argument even 
shows that it is sufficient to assume the continuity of only one of 
the two partial derivatives at the point (x, y). 

We have seen, in the general case, that partial derivatives may 
exist even when the differential does not. Hence, in defining differ- 
entiability for a function of two variables, we must make a choice: 
should we base the definition on the existence of the partial deriva- 
tives, or on the existence of the differential? By general agreement, 
we call a function differentiable at a given point if it has a differential 
at that point. Such a definition, requiring of a differentiable func- 
tion something more than the mere existence of the partial deriva- 
tives, is more convenient because of the closer analogy to the one- 
dimensional case. As further development of the theory reveals, 
only functions having differentials show in their properties some 
close resemblance to differentiable functions of a single independ- 
ent variable. 


To obtain the partial derivative Z., we add an increment to 
x 


only one variable, x, leaving the variable y unchanged. Speaking 
geometrically, we displace the point P(x, y) parallel to the x-axis 
to the point P;(x + Ax, y) (Fig. 23). 


y P(x, y + Ay) O(x + Ax, »y + Ady) 


P(x, ¥) Pi(x + Ax, y) 
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The partial derivative a is defined as the limit of the ratio 


” 


(Pi) — f(P) 
PP, 


as P; > P, where PP; = Ax is the distance from P to P), f(P) = 
S(x,y), and f(P1) = f(x + Ax, y). Similarly, 
acd f(P2) = f(P) 


= lim 
dy P2—>P PP2 


> 


where P2 is the point with coordinates (x, y + Ay) which is made 
to approach the point P along a line parallel to the y-axis. We can 


thus say that, for the function z at the point P, the derivative 8 ig 


» 


the derivative in the direction Ox and = is the derivative in the 
direction Oy. But it is clear that, in a completely analogous way, we 
can define a partial derivative in any other direction; say the 
direction making angle q with the positive x-axis. For this purpose, 
we need only displace the point P to a point Q where the vector 
PQ forms the angle » with the direction Ox, and then make Q 
approach P along this vector. It will be natural to call the quantity 


f(Q) — f(P) 


lim — 
Q-P PQ 
(if it exists) the derivative of z = f(x, y) in the direction 9. It is clear 
that specifying such a mode of approach is equivalent to fixing 
a linear dependence Ay = tan o Ax between the increments Ax and 
Ay as they simultaneously approach zero. 
If we agree to denote the derivative of z = f(x, y) in the direc- 
tion g by D,(z), we may then write 
Oz = Do(z) and Oz = D(z). 
Ox oy z 
We shall now show that a function z = f(x, y) differentiable at a 
point (x, y) has, at this point, a derivative in any direction,. and 
we have 


Diz) = cos @ + a sin . 
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Since corresponding to the displacement of the point (x, y) in the 
direction g through a distance p we have 
dz oz 
Az = —A —A ; 
Be ay y + 0(p) 
where 


Ax =pcosg and Ay=psing, 
it follows that 


HA es NOE i , 20) 
pox y ay . 


That is, 


_ Az GZ GZ 8 
lim — = D = ——~ cos —— sin g. 
a o(Z) ae p+ ay P 


39. DIFFERENTIATING IMPLICIT FUNCTIONS 


The further development of the theory of partial differentiation 
proceeds in rather close analogy with the differential calculus of 
the one-dimensional case, and, for lack of time, we shall not deal 
with it. Instead, we have still to consider some important problems 
in the one-dimensional case for whose solution we utilize partial 
derivatives. Under this heading belong, first of all, theorems on the 
existence and differentiability of implicit functions. (This generally 
accepted name is rather unsatisfactory. It would be more proper to 
speak of functions given implicitly or defined implicitly, since it 
is not a question of a special kind of functions, but only of a 
special way of defining them.) 

It has, of course, been brought to your attention more than 
once that the equation 


F(x, y) = 0 (17) 


determines y as an implicit function of x. This means that 
there exists a function y = f(x) such that 


F(x, fo) = 0. (18) 
It is clear that the conditions which guarantee the existence of 
such a function, its properties, and also the range of values x for 
which equation (18) is valid, all require special study. Here we 
shall prove only the fundamental theorem in this area. The further 
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development of the theory of implicit functions, based upon 
this theorem, may be found in any unabridged course in analysis 
and presents nothing new in principle. 


THEOREM 4. Ler F(x, y) be: (i) continuous in a neighborhood of 
the point (Xo, Yo), (11) differentiable at (Xo, )’o), and, (iii) equal to zero 
at that point, while at the same time Fy'(xo, Yo) #0. Then there 
exists, in some neighborhood of the point Xo, a function y = f(x) for 
which f (x9) = Yo and the equality (18) is an identity. This function 
J (x) is differentiable at Xo and 


te. — _. £2'(Xo, Yo) 
MO) Fy'(Xo-.¥0) ~ 
Proof. To be definite, let us assume that Fy’'(xo, Yo) > 0. Then, 
for a sufficiently small positive 6, we shall have F(xo, yo — 8) <0 
and F(Xo, Yo + B) > 0 (Fig. 24), with F(x, y) continuous at (Xo, y) 
for all y in[ yo — B, ¥o + f] and nonzero there except at (Xo, Vo). 
Therefore, for a > 0 sufficiently small, the inequalities 


F(x. Yo — 6) <0 and F(xo, Yo + 8) > 9 


g (Xo Yot B) a 















(Xo, Yo) 





® (xo, yo—B) & 


Fig. 24 
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will be valid for all points x in the interval [xo — a, xo + al. 
Utilizing Theorem 4 in Lecture 3 (p. 57), we can find, by vir- 
tue of the continuity of F for each such x a value y (|y — yo| < B) 
such that F(x, y) = 0. This value of y, dependent on x, (or one of 
them if there are several!) we shall denote by f(x), so that 


F(x, {() = 0 (18) 


for x9 -a <x <Xo + a. Clearly, at x = xo the function f(x) has 
only one possible value yo, so that {(xo) = yo. 

It remains only to prove the existence of the derivative /’(xo) 
and to find its value. Let |Ax| < a and Ay = f(xo + Ax) — f(%0). 
Then, by (18), we shall have F(xo + Ax, yo + Ay) = F(xo + Ax, 
(xo + Ax)) = 0. By the differentiability of F(x, y) at the point 
(xo, Yo), we obtain 

0 = F(xo + Ax, yo + Ay) — Flxo, Yo) = AF 
= F,'(xo, Yo) Ay + Fs'(xo, Yo) Ax + 0(0), 


where 
p = Ox? + By? < |Ax| + | Ayl. 

It follows from this that 

Fy'(xo, yo) Ay + Fr'(X0, Yo) Ax = A(|Ax| + |Ay]), 
where A —> 0 as p— 0; or, in another form, 

[ Fy'(Xo, yo) A] Ay + [Fe'(o, yo) + A] Ax = 02 
Since y = F,’(xo, Yo) > 0, we have for sufficiently small a and 
B (which implies sufficiently small Ax, Ay, and p), 

IA, <Sy and F/G, yo) A= YAY, 


and, consequently, 


Ay = F,'(Xo, Yo) HA and Brea < Fy'(Xo, Yo) + 3Y 
Ax Fy'(Xo, Yo) = A Ax ty 


1For instance, we may take this value of y to be the g.l.b. of all » in the inter- 
val [¥o — B. Yo + B] for which F(x, y) = 0. It will then follow from the continuity of 
F that F(x, f(x)) = 0. 

? The signs in front of A need not be in agreement. For example, if Ax > O and Ay < 0, 
we have +A in the first term and —A in the second term. 
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It is at once evident from this inequality, that as Ax — 0 we have 
Ay — 0, and hence p - 0 and A > 0 also. Consequently, 


7 : Ay F,'(Xo, yo) 
Xo) = lim = = — =. 
fa 0) 6 Ax Fy'(Xo, Yo) 


Let us further note that if the derivative F,’ exists and F,’ #0 
not only at the point (xo, yo), but also in some neighborhood 
of that point, then the solution of equation (17) which we have 
found is unique. For. if F(x. yi) = 0 and F(x, ye) = 0 where y1 < po, 
we would have by Lagrange’s theorem 


F(x, y2) — F(x. yi) = 0 = (v2 — yi) Fy'(x, y), 


where yi) < y <2. But from this F,’(x, y) = 0, which is impossi- 
ble by our assumptions. 

Let us now apply this rule for differentiating an implicit func- 
tion to the simplest case of the so-called conditional maxima and 
minima problem. The generalization of this problem to the case of 
a larger number of variables constitutes the important and interest- 
ing theory of conditional extrema, which regrettably we cannot fully 
develop here. 

Suppose that F(x, y) is differentiable in some domain. Further- 
more, suppose that x is an independent variable while y is defined 
in [a, b] by the relation 


D(x, y) = 0, (19) 


where ®(x, y) is also differentiable. Thus, F(x, y) is actually a func- 
tion of one independent variable x, defined in the complicated 
manner just described. Both in analysis and in its applications we 
frequently have to consider functions defined in such a way. 

We wish to find the extrema of the given function in the inter- 
val [a, b]. According to the general theory, the derivative of 
the function must assume the value zero at these extrema (relative 
maxima and minima). We must therefore equate to zero the deriv- 
ative of z = F(x, y) with respect to x, considering y as a function 
of x defined by (19); that is, we must equate to zero the so-called 


total derivative & of the given function with respect to x: 
x 


dz OF  aF_ ay 
dx Ox 7 dy ax ee) 
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In this equation a and ~ are both functions of x and y). 
os be 


To determine the quantity Y we must, of course, apply the 
x 

tule for the differentiation of implicit functions. From equation (19), 

which defines the function y, we obtain 


ab 
dy _ ax 
dx 3b 
ay 


and, consequently, (20) can be written in the form 


aF 0b 2 OF _y 
ax dy ax oy 





This equation, together with (19), permits us to determine all the 


critical pairs of values (x, y), that is, all the points at which “S = 0. 


6. The Integral 


40. INTRODUCTION 


The integral calculus was initially developed independently of 
the theory of differentiation. Only toward the end of the seven- 
teenth century, after both disciplines had already achieved a con- 
siderable degree of development and had succeeded in solving, 
each by its own methods, a great number of problems in geometry 
and mechanics, was the profound connection between them fully 
revealed. It then became clear that their fundamental problems are 
mutually inverse, and integration and differentiation of functions 
are to each other as addition and subtraction of numbers. 

This historical moment is usually considered to be the birth of 
the science which is now called mathematical analysis. And from 
this moment onward, with the idea of their unbreakable theoretical 
bond serving as the main propellant of both subjects, their devel- 
opment took on extraordinary speed. In the integral calculus, 
in particular. the theory passed from the solution of individual un- 
related problems to the creation of quite powerful and general 
methods. 

The historical development of the two basic branches of analy- 
sis is still reflected in the way they are presented in textbooks. 
While some authors, for the sake of greater logical coherence, de- 
fine the integration of functions as an operation inverse to differ- 
entiation, others, reproducing to a certain degree the historical de- 
velopment, prefer to define the two operations independently of 
each other, and only later establish their mutual connection. On 
the formal side it is, of course, irrelevant which of these paths we 
follow, and it is quite difficult, perhaps impossible, to maintain 
that one of these methods is preferable to the other. The point is, 
that here, as almost everywhere, much depends on the needs and 
interests of the student. One for whom a logical and theoretical 
interest in mathematics outweighs the interest in the applied 
and the practical will welcome the introduction of integration as 
an operation inverse to differentiation. Following the habit of 
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mathematicians, he himself might have wondered, while learning 
differentiation, what the inverse of such an operation would be 
like. To the student who is interested in practice: applications, this 
approach may appear artificial: studying an inverse operation 
as such, when one does not yet see in what concrete problems 
it can be useful, may seem to lack justification. 


41. DEFINITION OF THE. INTEGRAL 


The concept of the integral arose and gradually entrenched it- 
self in a position of importance when a whole series of problems in 
geometry and mechanics led to the necessity of performing the same 
analytical operation on a variety of functions. The essence of this 
operation was a certain passage to the limit, the nature of which is 
well known to you. 

We are given a function f(x) on a closed interval [a, 5]. Let us 
subdivide this interval into n parts, denoting the points of par- 
tition by 


GS No XK Xe a SS B. 


In each of the subintervals [x,_1, x,] (where k = 1,2,...,n) we 
choose an arbitrary point &. We multiply the value of the function 
f(x) at the point & by the length x, — x;_1 of the corresponding 
subinterval, and form the sum of all these products, 


& Sle — Xe) (1) 


Let us now imagine that we have before us the class of all 
such partitions and of all possible choices of the points & (remem- 
bering that the number n of subintervals may be changed arbi- 
trarily), and, hence, also the set of all possible values of the sum 
(1). Let us denote by / the length of the largest subinterval [x,_1, x,] 
in a given partition. If there exists a number J such that the sum 
(1) tends to J as /-+0, regardless of how the partitions are 
constructed and the points & are selected, then we call that num- 
ber the integral of f(x) over [a, b] and denote it by 


if f(x) dx. 
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A somewhat more precise formulation is as follows: 


DEFINITION. The number I is called the integral of f(x) over the 
interval [a, b] if, for any € > 0. there exists a 5 > 0 such that for any 
partition of [a, 6] satisfving the condition | < 6 and for any choice of 
the points &, we have the inequality 


1S fier = 4-3) <e 


In each case we see that we are dealing with a peculiar and quite 
complicated passage to the limit. The underlying process of 
this passage to the limit is difficult to describe in the usual manner, 
in terms of the behavior of some independent variable. We could 
take / as such a variable (and describe the process with the help of 
the symbol / = 0), but we have then to remember that the sum (1), 
whose limit we are discussing, is not a single-valued function of the 
quantity /. Itis easy to see that to a given value / there corresponds 
an infinite number of different partitions and that to each of these 
partitions there corresponds an infinite number of possible choices 
of the points &,. Thus for the limit J to exist we must be able to make 
the sum (1) differ arbitrarily little from J merely by taking / suffi- 
ciently small. irrespective of how we make these choices. 

This complexity of the process underlying integration leads to 
certain inconveniences. In general theoretical constructions, as well 
as in concrete practical cases, it is possible to prove rigorously that 
the limit J is independent of the nature of the partition and 
the choice of the points &,. These proofs, however, with their cum- 
bersome formalism, frequently make the reasoning very awkward. 
For this reason, many modern presentations resort to defin- 
ing the integral in a somewhat different manner so as to avoid in- 
troducing, at the beginning, any passage to the limit. We shall now 
turn to this other formulation (which, of course, is formally equiv- 
alent to the previous one). 

Let f(x) be an arbitrary function bounded on the closed inter- 
val [a, b], and let us denote its bounds by M (least upper) and 
m (greatest lower). For any partition 7 of [a, 5], let us denote 
by M, and m, the least upper and greatest lower bounds, respec- 
tively, of f(x) in the interval [x;_1, xx] and write 


Xk — Xp-1 = Ax (ke): 
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Also let 


Cio, Ny 
k=1 


and 


sp = > mAs. 
k=1 

Thus, to each partition T there corresponds a definite upper sum Sr 
and a definite Jower sum sp. Sincem < me <M, < M(1<k <n), 
we have, for any partition 7, the inequalities m(b — a) <sr< 
Sr < M(b — a). The set of values of the upper sum Sv, as well as 
of the lower sum Sy for all possible different partitions of (a, 5]. is 
therefore, a bounded set. 

We call the greatest lower bound of the set of all upper sums Sv 
the upper integral of f(x) over [a, b] (or between the limits a and 5) 
and we denote it by the symbol 


— i f(x) ax. 


In the same way, we call the least upper bound of all lower 
sums sr the lower integral of f(x) over [a, b] (or between the limits 
a and 5) and denote it by 


r= [” fx dx. 


Thus, every bounded function defined on a closed interval has 
an upper and a lower integral over the given interval. These 
two integrals are defined as the bounds of certain sets and, as you 
see, without any appeal to the concept of a limit. 


DEFINITION. If the upper and the lower integral of f(x) over [a, b] 
coincide, we call their common value the integral of f(x) over [a, bj 
(or between the limits a and b) and denote it by 


T= i ” f(x) dx. 
The function f(x) is then said to be integrable over [a, 5]. 


To draw the needed conclusions from this definition, we shall 
prove some elementary lemmas. 
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We shall say that a partition T’ of the interval [a, b] is a refine- 
ment of the partition T if all the points of partition of T are 
also points of partition of T’. (In general, T’ also contains new 
points of partition which do not belong to 7.) 


Lema 1. If T’ is a refinement of the partition T, then 
Srp < Sp and spr > Sr. 


Proof. In passing from the partition T to its refinement T’, 
each term M,A;, of the sum Sv is replaced by a group of terms 


> M; BAe 
where Ax,. Ax....., Ax, are the lengths of the subintervals formed 
from the interval [,%_1, x;] in passing from T to T’, and M,,, is the 
least upper bound eae in A;,. Since M;, <M, (1 <r <5), we have 


Me, Ry. 2 i = = M,.Ax; 


r=1 


vas 


that is, the group of terms which replaces M,A, has a sum 
not greater than M,Ax. And as this is true for all k, we have 
Sr < Sr. (The second inequality is proved in a precisely analogous 
manner.) 


LEMMA 2. For any two partitions T, and T2 of the interval [a, b] 
we have the inequality 


Sr, = ST. . 


Proof. The set of all points of partition of 7; and Ty» deter- 
mines a third partition 7, which is clearly a refinement of both 7; 
and 7». For the partition T, we clearly have S; > sr. Hence, 
by Lemma | 


Sr, > Sr > sp > ST, q.e.d. 
A direct corollary of Lemma 2 is the following: 


LemMa 3. For every (bounded) function we have the inequality 
T>L 


Proof. Since none of the sums sr exceeds any of the sums Sv, 
the least upper bound J of the set of all sr cannot exceed the great- 
est lower bound 7 of the set of all $7. 
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Let us now denote by /r the length of the largest subinterval of 
the partition T. 


LemMMA 4. For any ¢ > 0 there exists a X > 0 such that for 
all partitions in which lp < , we have Sp <I 4+ eandsp > 1 — «. 


Sometimes this lemma is expressed more briefly by saying that 
Sp— T and sp— Jas 1p— 0. Such a formulation cannot cause any 
misunderstanding, if we simply remember that S7 and sz are 
not single-valued functions of the length /r. 


Proof. Without loss of generality, we may assume that f(x) > 0 
fora <x <b. We can always achieve this by adding to f(x) a suf- 
ficiently large number A, whereupon all the integrals and sums will 
increase by A(b — a). By the definition of the greatest lower 


bound, there exists a partition Tp such that S7, <I + a Let 


us denote by x1, X2,..., Xn the interior points of partition of To, 
and let M be an upper bound for |f(x)| in [a, 5]. Finally, let us intro- 

€ 
4nM 
lp < X. We shall show that Sp <7 +. 

For this purpose we shall divide the intervals formed from [a, 5] 
by the partition T into two groups: in the first group (I) we in- 
clude every interval which is entirely contained in one of the inter- 
vals [xx — A, xx + A], 1 < &k <n, and in the second group (II) the 
remaining intervals (Fig. 25). Clearly, every interval in the second 


duce A= 





, and let T be any partition of [a, b) for which 


II I Yt I Tt Ww uw ul I I WU 


Sane’ 


‘s ” Skies. Sats 
Xk-y — A Xk-1 Xe-1 +A Xp—-A %k 


group is entirely contained in one of the subintervals of the parti- 
tion 79. Thus, the sum S7 breaks up into the sums Sp! and Sp". In 
the sum S7!, all the terms have first factors not greater than 
M, while the sum of the lengths of the intervals which constitute 





the second factors is not greater than 2nA = ira Therefore, 
Spi <M—_=£ 
OM 2 
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On the other hand, if we consider S74, the portion of the sum 
contributed by the intervals of the partition T which are contained 
within some [x,_1, Xx] (Fig. 25), we see that its first factors are not 
greater than M, and that the sum of its second factors is not 
greater than x; — Xx_1 = Ax. Therefore, the value of this portion 
of Sr" is not greater than M;A,. Consequently, 


S; < Ss M,A; _ Sr. 
k=1 
We thus have 


Sr= Srl + Sr < Srl + Sync 5tl+sSat+e, 


which is what we set out to prove. And the second inequality 
of our lemma is proved in an analogous manner. 


THEOREM 1. Let f(x) be integrable over [a, b}. Then for any « > 0 
there exists aX > 0 such that for any partition T in which Ip <>, 
and for any choice of the points & (xk-1 < && <x, and1<k <n), 
we have the inequality 


|= fG) de — 1] <e 


where I = ih : (x) dx. 


Proof. Since my < f(&&) < Mz (1 < k <n) for any choice of 
the points & in the corresponding subintervals, we have 


Sp <> fEx) Ak < Sr. 
k=1 
On the other hand, by Lemma 4 we have 


IT—-ex=Il—e<sp<Spcl+e=l+e 


for all partitions with sufficiently small /p. Combining these in- 
equalities with those immediately above, we find that for all parti- 
tions with sufficiently small /p, we have 


fees fee ee 
k=1 


The converse theorem is also true, as we can readily see. 
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THEOREM 2. If there exists a number I such that for any parti- 
tion T with sufficiently small lr and with any choice of the points &, 


the sum 3 S (Ee) Ax differs from I by arbitrarily little, then the func- 
k= 
tion f(x) is integrable over [a, 6}. 
Proof. Given any e > 0 choose T with /p so small that any sum 


Y= s! S(Ee) An KT + a . We may choose &; in the interval [xx_1, 
1 


x;] so My — f(&) < a A, (since M;, is the /east upper bound of f 
in this interval). Then Sp — Y= S° (Mi — f(Ex)) Ax < oi Thus Sr 
1 


ee pe a +e, and since 1 < Sp, J <I] +e. Since e is arbi- 


trary, J < J. We may show similarly that J < J, so that (using 
Lemma 3) /</<J< J and equality must hold throughout. 





Thus, the new definition of the integral is equivalent to the 
original definition. 
The following is clearly a corollary of Lemma 4: 


COROLLARY. If, by wx = My — mx we denote the oscillation of 
(x) in the interval [x,_1, Xx], then a necessary and sufficient condi- 
tion for the integrability of f(x) over [a, b] is that the sum 


>) wx Ay = Srp — Sr (2) 
k=1 
be arbitrarily small for all partitions T with sufficiently small |p. 


42. CRITERIA FOR INTEGRABILITY 


The necessary and sufficient condition for integrability which 
was given in the foregoing corollary is usually not applied directly 
to any given function, as in most cases it is not easy to learn 
the behavior of the sum (2). But with the help of this condition, it 
is very easy to establish general criteria for integrability which 
apply to more or less wide classes of functions. 

First of all, we shall show that every function f(x) continuous on 
[a, b] is integrable over this interval. For, by the property of uniform 
continuity (Theorem 4 in Lecture 3, p. 57), it follows that for each 
e > 0, there exists a A > 0 such that the oscillation of the function 
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f(x) will be less than e in every interval of length less than A. Then, 
for any partition T for which /7 < X, all wy, in the sum (2) will be 
smaller than e and, consequently, 


opie <e>. Ag Se =a): 
k=1 kr=1 


that is, the sum (2) becomes arbitrarily small for sufficiently small 
ly. The integrability of f(x) over [a, b] follows. 

Can a noncontinuous function be integrable? It is easy to show 
by examples that this is possible, and we can even establish certain 
general laws concerning this question. We shall show, for example, 
that a bounded function f(x) which has only one point c of dis- 
continuity in the interval {a, b| is integrable over this interval (regard- 
less of the nature of the discontinuity). 

For this purpose, let us denote by u an upper bound for [f(-x)| 
in {a, b] and choose any e > 0. The function f(x) is continuous in 
the intervals E a =| and Ec haere, b|. Therefore, we may 

2p 2p i 
again find a A > 0 such that the oscillation of the function in any 
interval of length less than and wholly contained either in 


E c— =a or in [¢ aay b| will be less than e. Now let T 
Qu Qu. 
be any partition of [a, b] for which /7 < X. In general, among the 


intervals A; there will be some that are wholly contained in one of 
the two above mentioned intervals, and some that will overlap the 


interval E ~—=——,¢+4 <| (Fig. 26). 


2p. Qu 
Type | Type I Type I 
a ee ee a a b 
c Dn c+ rm 
Fig. 26 


For the intervals A; of the first type, the oscillations wx in the 
sum (2) are less than e, and, consequently, the part of the sum (2) 
contributed by these intervals of the first type is less than e(b — a). 
For the intervals A, of the second type, we can say with regard to 
the oscillations w;, only that each of them is not greater than 
2u. But since the intervals of the second type are all wholly con- 
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tained in the interval E eae ee at AI, the sum of their 
2p 2p. 
lengths does not exceed = + 2A; hence the corresponding part 


of the sum (2) is not greater than 2p (= 4 2a] = 2e + 4Ap. Com- 
Bb 


bining this with our estimates of the first part of (2), we find that, 
for Ip <A, 


S) wx An < e(b — a) + 2e + 4Ap. 
kr=1 


As e and X are arbitrarily small, the integrability of f(x) over [a, 5] 
is established. 

We have deliberately carried out this simple reasoning in 
the fullest detail. It is important here to grasp the basic idea of the 
proof: the sum (2) turns out to be arbitrarily small because, for the 
intervals in which the function is continuous, the first factors wx are 
arbitrarily small, and in the intervals near the point of disconti- 
nuity the second factors yield an arbitrarily small sum. Therefore, 
both parts of the sum (2) are arbitrarily small, and hence the whole 
sum is also. It is now easy to understand (and prove rigorously) that 
the existence of any finite number of points of discontinuity of f(x) 
in [a, b] does not interfere with its integrability, provided the func- 
tion remains bounded. On the other hand, if the points of discon- 
tinuity constitute too large a part of the interval [a, b], the function 
may prove to be nonintegrable. For example, the Dirichlet func- 
tion (Lecture 3, p. 46) which is everywhere discontinuous, is not in- 
tegrable over any interval. This is so because for any partition we 
have w, = 1 for all &. Consequently, for any interval [a, b] we 
obtain 

n n 
Sw, Ay = oy Ay = b — a. 
k=1 k=1 

The question concerning the number of discontinuities which a 
bounded function may have in an interval [a, 6] and still remain 
integrable finds its complete solution in the following proposition. 


THEOREM 3. A necessary and sufficient condition that a bounded 
function f(x) be integrable over the interval (a, b] is the following: for 
any € > 0, all the points of [a, b] at which the oscillation of f(x) ex- 
ceeds € can be included in a finite number of intervals, the sum 
of whose lengths is less than e. 
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Proof. (i) Suppose the condition stated in the theorem is satis- 
fied. Let us denote by 4), 52,..., 6, a family of intervals containing 
in their interiors all the points at which the oscillation of f(x) ex- 


ceeds e and such that > |5:| <e, where |6;| denotes the length 


of the interval 6;. We ‘Shall denote by dy, dx,...,d, the family 
of complementary intervals (that is, the family of intervals 
obtained from [a, b] by removing the intervals 61, 5:,..., 6,). 
Since at each point in any of the intervals d; the oscillation of f(x) 
does not exceed e, it follows. from Theorem 9 on page 65 in Lec- 
ture 3, that the oscillation of the function will be less than 2e 
in any interval of sufficiently small length contained wholly in one 
of the intervals d;. Now we have only to imagine any sufficiently 
fine partition (that is. having sufficiently small /7) of the interval 
[a, b] into subintervals A,, and argue along the lines indicated on 
page 142. Divide the sum (2) into two parts according to whether 
the interval A, is wholly contained in one of the intervals d; 
or whether it overlaps, even partially, one of the intervals §;. 
We readily find that the first part of the sum (2) is less than 
2e(b — a), and that the second part of the sum (2) does not exceed 
2u(e + 2nlr), where pis the least upper bound of | f(x)| on (a, b] andn 
is the number of intervals 6;. Taking e and /7 sufficiently small, we 
can make the sum (2) as small as we please. We have thus proved 
that the condition of the theorem is sufficient. 

(ii) To prove that the condition is necessary, suppose f(x) to be 
integrable over [a, b] and let e be any positive number. We can 
choose a partition of [a, b] such that the corresponding sum (2) will 
be less than e2. We now observe that all the terms of the sum (2) 
are nonnegative. Therefore, we shall not increase this sum if we re- 
tain in it only those terms in which w, >e and discard all the 
remaining ones. Thus, 


ee > > wy Ap > S wr Ax > EL ke 
WK DE WK OF 
and, consequently, 
>. Ax < €. 
wp >F 
But the intervals A, included in this last sum clearly contain all the 


points of [a, b] at which the oscillation of the function exceeds 
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e (since such a point cannot belong to an interval A, for which 
w, <e). Thus, the criterion for integrability formulated in our 
theorem is seen to be necessary. 


From the theorem just proved we have the following: 


CoROLLARY. Every function monotonic on an interval [a, b] is 
integrable over this interval. 


Proof. As we have seen in Lecture 3, every such function 
must be bounded in the given interval. Furthermore, no matter 
how small the positive number e, the number of points in [a, b] at 
which the oscillation of the function is greater than e must be 
finite. As it is always possible to contain a finite number of points 
in a finite family of intervals such that the sum of their lengths is 
arbitrarily small, the criterion of integrability in the theorem above 
is satisfied by every monotonic function. 


43. GEOMETRIC AND PHYSICAL APPLICATIONS 


The concept of the integral considered above is closely con- 
nected (through the notion of limit or bound) with sums of a cer- 
tain type. Although, in itself, it is totally unrelated to the concepts 
of the differential calculus, it has, as you know, many geometric 
and physical applications. We shall not dwell on these applications 
as such, nor shall we even stop to enumerate the most important 
of these. It is, however, very important for us to notice a charac- 
teristic and rather subtle logical situation, frequently encountered 
in these applications, which is usually passed over too lightly in 
textbooks. 

For example, let us consider the computation of the area ofa 
curvilinear trapezoid (Fig. 27) bounded above by the graph of a 











Fig. 27 
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function y = f(x), which for simplicity we assume to be positive 
and continuous. You are, of course, familiar with this problem as 
well as with its solution by means of the integral calculus. But let 
us try to penetrate more deeply into the logical situation which we 
find here. The computation of the area of any particular figure 
makes sense only when the concept of area itself is exactly defined. 
But at this moment, when we are looking for a method which 
would permit us to compute the area of the curvilinear trapezoid 
of Figure 27, do we yet have such a definition? Can we precisely 
define that quantity which we plan to compute? Obviously not; we 
know the definition of area only for rectilinear figures (polygons) 
and certain parts of the circle. However, the curve y = f(x) which 
bounds our trapezoid has, in general, nothing in common with the 
circumference of a circle. 

How then, without knowing what we mean by the required 
area. can we proceed to its computation? And what is more amaz- 
ing, how is it that we succeed in performing this computation, 
although we do not know logically what it is that we are comput- 
ing? The fact is that we both state and solve a problem, rather than 
carry out an ordinary computation: with the concept of the inte- 
gral, we simultaneously define the area for our curvilinear trapezoid 
and find a method for computing that area. For, when we state 
that the area of our figure is equal to the limit of areas of 
rectilinear step-shaped figures, one of which is represented in 
Figure 27, such an assertion is not a theorem, but a definition of the 
area of the figure. It would make no sense, therefore, to try to 
prove that statement. On the other hand, the assertion that the above 
mentioned limit ex/sts under such and such conditions, for instance 
the continuity of the function f(x), is a theorem which can and 
should be proved. And precisely the same may be said concerning 
the logical nature of every geometric and physical problem of sim- 
ilar type. Whether we plan to compute the length of an arc, a vol- 
ume, an area of revolution, or the work of a given force over 
a given portion of path, we are concerned with finding a quantita- 
tive measure for a concept which has previously been defined only 
for the simplest particular cases. The problem is to formulate, for 
this concept, a suitable general definition which will at the same time 
embody a method for computing the corresponding quantity. 

What are the common features of the problems of geometry 
and physics mentioned above (and many others) which make it 
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possible to solve all these problems analytically in the same way? 
We wish to use the integral as the tool which defines the desired 
quantity and simultaneously gives its value. Two basic features can 
be observed. In all cases, the quantity we seek depends on the in- 
terval [a, b] over which it extends, and it will vary as this interval 
is changed. Thus, in Figure 27 we obtain a different area if we re- 
place the segment [a, b] by another segment (of course, leaving the 
function unchanged). The work of a force is different over various 
parts of the path described by the point in motion, and so on. On 
the other hand, the quantity under consideration in each particular 
problem depends on some function f(x). In Figure 27, it is the 
ordinate of the point on the upper boundary of the trapezoid 
whose abscissa is equal to x; in the problem of computing work, it 
is the value of the force acting at the distance x from the origin; 
and so on. 

Thus in formulating a problem of this type it is necessary first 
of all to set up a function f(x) and an interval a < x < b to which 
our problem will refer. We can say that the quantity whose defini- 
tion and value we seek is a function V(/; a, b) of three elements 
which can be chosen independently of one another: the function 
(x) and the numbers a and b. It is easy to see that the application 
of the integral as a method of solving all the above-mentioned 
problems depends basically upon the following properties of this 
function Vf; a, 5). 

Property 1. As a function of an interval [a, b], the quantity V is 
additive, that is, for a << c < b we have 


VUjy a3.b) = V Ga; 0) VY 6,, 8). 


Indeed, under any reasonable definition, the area of Figure 27 
should be equal to the sum of the areas of the curvilinear 
trapezoids into which that area breaks up when we subdivide the 
interval [a, b] into two intervals. The volume of a solid of revolu- 
tion is equal to the sum of the volumes of the solids generated 
about the separate segments into which we may divide the axis of 
revolution; the length of an arc is equal to the sum of the lengths 
of its parts; the work of a force over a given path is equal to 
the sum of the work performed by the force on each separate por- 
tion of that path; and so on. 


Property 2. If the function f(x) is constant on [a, b], that is, 
f(x) = C, then 
Vf. a, b) = C(b — a). 
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Indeed, if f(x) = C (a < x < 5), then the Figure 27 is a rectangle 
whose area is equal to C(b — a). In the case of a solid of rev- 
olution, if f(x) represents the area of its cross section, and if 
f(x) = C(a <x < Bb), then we have a cylinder, whose volume is 
equal to C(b — a); if the force acting on a point remains constant 
over the path a < x < b, the work of this force over the given path 
will be equal to C(b — a); and so on. 

It is now readily seen that in all cases in which the dependence 
of the desired quantity V upon the given elements f(x), a, and 
b has the characteristics Properties | and 2, it is natural to expect 
that the solution of the problem will be the integral 


V = [70 dx. 


For, if we subdivide the given interval, as usual, by the points of 
partition 


A= Xo 1 Xn <n =O, 
then, by Property lI, 


VA a0) = 2 VFS Xr-1, Xk): 
1 


And, if the function f(x) had for all x in the interval [x;,_1, xx] the 
constant value C;, then, by Property 2, we would have 


VE; Xk-1) Xr) = Cy (XK _ Xk-1): 


In general, the function f(x) is not constant on [X~—1, Xx]. If the 
function is continuous, however, and if the interval [xx-1, Xx] 1s 
very small, then values of f(x) in this interval differ very little from 
one another. Taking one of these, f(&), we may say that in the 
whole interval [x,_1, x], the function f(x) is approximately equal 
to f(&), and, consequently, it is natural to consider the quan- 
tity V(fi xx-1, Xx) (we must not forget that this quantity is not yet 
determined) as approximately equal to f(Ex)(xn — X;_1). Thus, we 
have the approximation 


V (fi a, B) = > [EN — Xe 


where & is an arbitrary point in [xz_1, xx]. We assume, further- 
more, that the error of this approximation is infinitesimal as 
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the intervals of partition become smaller. We are thus led, in a 
natural manner, to define the precise value of the quantity V as 


lim > SEN — xn) = ff 0) ax. 


44. RELATION OF INTEGRATION TO DIFFERENTIATION 


In the first stage of the development of the integral calculus, 
when the enormous importance of the connection between integra- 
tion and differentiation had not yet sufficiently permeated mathe- 
matical thought, problems of the type described above were solved 
by direct computation of the integral as the limit of a sum. By the 
choice of particularly convenient partitions and especially selected 
values of £,, mathematicians tried in each case, that is, for each 
particular function f(x), to make this computation, in general very 
cumbersome, as easy and simple as possible. 

A number of problems had already been solved in this manner 
in ancient times; to these were later added a series of new achieve- 
ments. Yet, in spite of this, until the relationship between integra- 
tion and differentiation became the basic method of computing 
integrals, all results remained scattered and each new problem 
required an essentially new method of attack. Integral calculus de- 
veloped its general and most valuable methods only in close 
collaboration with the theory of differentiation. 

You know, of course, the nature of this connection between in- 
tegration and differentiation: 


THEOREM 4. At each point x at which the function f(x) is contin- 
uous (for a continuous function, in particular, this means everywhere) 
the derivative of the integral 


F(x) = j “f(u) du 


d 


is the integrand f(x). 


Proof. The proof is very simple. If |/:| is sufficiently small, 
then, for : 
x— |h| Cu<x4 fA, 


LO) —es fH) <f@) +6 


where e is an arbitrarily small number chosen beforehand. 


we have 
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Consequently, 


af(x) — elh| < JP" P00) du < Af) + elAl, 


and, hence, 
f(xy -ex< +f" 400 du = Aer Fw <fx) +e. 


Since e is arbitrarily small, it follows that 


F'(x) = f(x). 


By virtue of this relation, every differentiation formula yields a 
certain integration formula when read, so to speak, from right to 
left. But more important than this, many general rules for differ- 
entiation may be at least partially reversed, and thus lead to the 
most important general methods of integration. For example, the 
rule for the differentiation of an algebraic sum is completely revers- 
ible and yields the formula for the integration of an algebraic sum 
(which, of course, may also be obtained directly from the definition 
of the integral). The differentiation formula for a product leads to 
integration by parts, with whose effectiveness you are familiar, 
while the reverse of the chain rule for composite functions is the 
likewise familiar and even more powerful method of integration by 
substitution. 

The application of this entire group of methods makes it possi- 
ble to integrate a great number of elementary functions, in particu- 
lar, a/l rational functions. And if, nevertheless, we are unable to 
find the integrals of many other elementary functions, it is not be- 
cause of the insufficient power of the methods we employ. It is for 
another reason, of incomparably greater theoretical significance. 
While the differentiation of elementary functions always leads 
to other elementary functions, the situation is completely different 
when we are integrating elementary functions. It very often hap- 
pens that the integral of such a function, although it exists, iS 
nevertheless not an elementary function and, therefore, cannot be 


expressed by any elementary formula. This is the case, for instance, 


with the integrals of such simple functions as - : and a 





Such a new function has to be studied without any other tool (at 
least initially) except the integral defining it. 
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45. MEAN VALUE THEOREMS FOR INTEGRALS 


In the differential calculus the term mean value theorem is com- 
monly applied to Lagrange’s theorem (Section 33) which states: Jf 
the function f(x) is continuous on the interval [a, 6] and differentia- 
ble within it, there exists an interior point c of that interval such that 


f(6) —f@ =f'(OU — 2). 


In general, there is characteristically present in the formulation 
of mean value theorems a certain number c (the mean value of the 
quantity x between a and 5), about which we know only that it lies 
inside the interval {a, b], and nothing more. In this sense, Cauchy’s 
formula (Lecture 5) as well as Taylor’s formula with the various 
forms of the remainder, also represent mean value theorems. Very 
often, theorems of this kind are formulated using somewhat differ- 
ent notation: one speaks of the interval [a, a + A], and the unspeci- 
fied interior point of that interval is denoted by a + 6h, where the 
double inequality 0 < 6 < 1 is all that these theorems assert con- 
cerning the number @. The presence in the formulation of the 
theorem of such an unspecified number satisfying the inequalities 
0<6@< 1 may be considered as a typical feature of mean value 
theorems. 

The role of mean value theorems in the integral calculus is 
no smaller than their role in the differential calculus. One of them, 
called the first mean value theorem, is undoubtedly known to you. 
In the simplest case this theorem states that: If f(x) is continuous 
on [a, b], then 


[feo dx = (6 - af, (3) 


where c is an interior point of [a, b]. We can prove this theorem by 
the direct application of Lagrange’s theorem to the function 


F(x) = {fw dua<x<b. 
More general is the relation 


m(b — a) <f'f(x)dx < Mb—a) (a<b), (4) 


where m and M are, respectively, the lower and upper bound 
of f(x) on [a, 5]. This relation holds for any bounded, integrable 
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function. From the point of view of estimating the integral, 
the equality (3) does not give us anything more than the inequality 
(4), as the location of the point c is not known. 

A more general formulation is the following: 


FIRST MEAN VALUE THEOREM. If f(x) and g(x) are continuous on 
[a, b] and g(x) > 0 there. then 


b b 
[- £0) 9) dx =f [od ax, (5) 
where c again denotes some (unspecified ) interior point of [a, bl}. 
Proof. To prove formula (5), which in the particular case where 


g(x) = | coincides with formula (3), it is enough to apply Cauchy’s 
formula in the interval [a, b] to the functions 


Fox) =f fo) 9) du 
and 
OX)-= [ow du. 


This gives 


[fe p(x) ax F(b) — F(a) 


(50) m2) - %@) 


_ FO _ LOO — p¢ 
“we TO 


that is, formula (5). 


A considerably sharper analytical tool is provided by the 
so-called second mean value theorem, which we shall now discuss. 
This theorem also concerns the integral 


[£09 ed) ax, (6) 


but only where one of the two factors in the integrand is monotonic 
in the interval [a, 5]. 
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Suppose p(x) is nonnegative and nonincreasing for a << x <5, 
and, as usual, let 7 denote a partition of [a, b]. We shall first show that 
the integral (6) is the limit as /p — 0 of sums of the form 


S=3 o&) [£09 ax. (7) 
kr=1 Tr-1 


Thus let » denote the least upper bound of |f(x)| in [a, 5], 
and A the difference between the sum (7) and the integral (6). 
From the assumption that p(x) is nonincreasing, it follows that 


Al =| >" to) — eof) ax 


= > [p(%%-1) — P(%K)] MX — Xe-1) 
< slr ola) — 9(6)), 
and, consequently, 
A—-0 as /p—->0. 


Having made this observation, let us now transform the sum 
(7), using Abel’s lemma (Lecture 4, p. 98). Setting 


Ax = [Seo ax (k = 0,1,2,...,n), 


we find 
= > Hf £08) dx 


(EKA K — Ax-1) (8) 


Hive 


= > Advé) — P(Ex+1)] + Ane én). 


Now let M denote an upper bound and m a lower bound of the 
function 


f- fw) du ‘ (9) 


in the interval a < x < b. It is obvious that m < A, <M 


(0 <k <n), and, since by the assumed properties of (x) all A, on 


the right side of formula (8) are multiplied by nonnegative factors, 
this formula gives 


mf) < S < M@(é1). 


But as /7 > 0 we have 


¢(é1) > g(a +0) and So i ” F(x) g(x) dx. 


Consequently, 


mga + 0) < [’f(x) ox) dx < Mg(a + 0). 


The case where g(a + 0) = 0 is, of course, trivial; when g(a + 0) 
+ 0 the foregoing is equivalent to 


m< xX) g(x) dx < M. 
a ry Jef) 9) 
Since the function (9) is continuous, it must attain at some interior! 
point € of the interval [a, b] a value (intermediate between its 
upper and lower bounds) equal to the middle member of the 
inequality above; whence, 


P00) dx = a+ Of'fde @<é<b. (10) 


This formula expresses the content of the second mean value 
theorem under the condition that p(x) is nonnegative and nonin- 
creasing and that f(x) is any integrable function. The mean value € 
appears here in the role of one of the limits of integration. 

If we assume that (x) is again nonnegative, but this time non- 
decreasing, then, setting x = b — y and g(b — )’) = ¥()), we have 


[-709 #00 de = [7b — yO) ay. 


1That we actually have a< é <b rather than a < € < b needs further argument. 
which is left to the reader. It is assumed, of course, that g(x) is not constant. If 
it were, we could no longer assert that € is an interior point of [a, b]. To see this, we 
merely need to consider the case in which f(x) > 0 on [a, 5]. 
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Since the function ¥(y) is nonnegative and nonincreasing for 
0<y <b —a, we may apply formula (10) to the last integral 
above: 

b-a 7 b 
Jo FO —) 40) &y = V4 OF 70 — y) ay = lb = Of, (0) ax, 
where 0 < 7 < b — aand, consequently,a << § =b—7 < b. Thus, 


the second mean value theorem now takes the form 


J.709 909 dx = 9 — Of 700 ax. 


Finally, let us assume that g(x) is monotonic, for example non- 
increasing, but let us set no restrictions upon its sign. Since 
the function g(x) — g(b — 0) is nonnegative and nonincreasing on 
the interval [a, b], applying formula (10) we obtain 


ffeoleed) — ob — 0) dx = [pla + 0) — gb — Of FOO) ax, 


from which we obtain the 
Second mean value theorem. Let g(x) be monotonic fora <x <b, 


and let f(x) be an integrable function. Then{ "f(x)p(x) dx = g(a + 0) 


f f(x) dx + @(b — 0) f ” f(x) dx for some & such thata <E <b. If, 


in particular, the function (x) is continuous at the points a and b, 


then Mies g(x) dx = 9(a) f/f dx + ¢(b) ies dx for some 
€ such thata < € <b. 


46. IMPROPER INTEGRALS 


We shall now direct our attention to two extremely important 
generalizations of the original notion of an integral: integrals with 
infinite limits of integration and integrals of unbounded functions. 
In both cases, we have an actual extension of the theory, and not 
simply an application of the idea under new conditions; to the 
original structure of the integral there is added a supplementary 
passage to the limit. This generalized integral no longer represents 
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the limit or bound of a particular kind of sum, but the limit 
of integrals. 


As you know, the symbols 


+x b +x 
a aaa 
a —x -x 
are, respectively, defined as, 
; bo. b ; b 
lim { . lim |, and tlm iD 
bo+nda Q-+—-xda fom awe 
botx 


In order to study better the underlying idea of this generaliza- 
tion, we shall consider a simple case. Let f(x) be positive and non- 
increasing for x > a (Fig. 28). We can then represent the integral 


iy f(x) dx 


geometrically by the shaded area in Figure 28. As b increases, this 





Fig. 28 


area increases; as b-—» +, the area may increase indefinitely, or 
it may remain bounded and, consequently, tend to a limit. It is thi 
limit which is designated by the improper integral 


{Fe9 dx. 


As its geometric illustration, we may use the area lying to the night 
of the line x = a and bound by the x-axis and the curve y = f(x). 
This area may be finite, even though the corresponding figure 
extends indefinitely. ~ 

The simplest case of the second generalization (the integral of 
an unbounded function), although it appears to be the solution of 
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a completely different problem, basically differs very little from the 
case just discussed. This is seen immediately if we observe that its 
geometric representation may be obtained simply by the reflection 
of Figure 28 about the bisector of the angle which determines the 
first quadrant. If, for example, the function f(x) becomes un- 
bounded in the neighborhood of the point a, we define the integral 


[700 ax 


b 
as the limit of the ordinary integral |. f(x) dx ase— + 0 (Fig. 29). 





O aa+e b x 


Fig. 29 
You can see that here again the problem is the possibility of 
ascribing a definite value to the area of a figure which extends 
to infinity. This figure is of the same type as the first figure, only 
differently situated. That the approximating areas increase differ- 
ently in this case has no essential significance. 

In the general case, that is, when the behavior of the function 
f(x) is unrestricted,! the definitions of the improper integrals 
do not change, although such a simple geometric interpretation is 
no longer available. When the corresponding limit exists, we say 
that the given improper integral exists, or that it has a meaning, or 
that it converges. Thus, the question of the convergence of an im- 
proper integral of the first or second kind is always reduced to the 
problem of the existence of a limit of some given function. Hence, 
all general propositions in the theory of limits can be applied 
to improper integrals. In particular, Cauchy’s condition is valid in 
this connection, taking the form: 


1We still require. of course. that the unboundedness of the integrand occurs in 
the neighborhood of only one point. 
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CAUCHY’S CONDITION.! If f(x) is integrable over [a, b] for any 
finite b > a, then, for the convergence of the improper integral 


I(x) dx, 


it is necessary and sufficient that given any e > 0 we have the 
inequality : 


bs 
JJ, £00 ae] <e 
for all sufficiently large by and bo. 


In exactly the same way, we have the following two important 
theorems, presenting, as you will see at once, a complete analogy 
to the corresponding theorems in the theory of series: 


THEOREM 5. If, fora<x< +00 we have 0 < f(x) < ox), 
where f(x) and g(x) are integrable over any interval [a, b] (b > a), 
then, from the convergence of the integral 


fe) ax, 


there follows the convergence of the integral 


{feo ax. 


This theorem may be called the comparison test for integrals. It 
is entirely analogous to the comparison test for series (Lecture 
4, p. 76), and the proof can be carried out easily following 
the method used there. 


THEOREM 6. From the convergence of the integral 


fP [feol ax, (11) 


there follows the convergence of the integral 


[Feo ax. (12) 


1 For brevity, here and in what follows, only one type of improper integral is employed. 
But all that is said applies with suitable modifications (which you can easily formulate), 
to the other type as well. 


161 


This theorem is completely analogous to the corresponding 
theorem in Lecture 4 (p. 81), and can be proved in the same man- 
ner. Here, too, we say that the integral (12) is absolutely convergent 
if the integral (11) is convergent. 

With the help of the above theorems, it is easy to establish the 
convergence of a great number of frequently encountered integrals: 
Thus, from the obvious convergence of the integral 


(ie dx 
1 x? 


there follows, by Theorem 5, the convergence of the integral 


+ |sin x 
f [sin x] | ax, 


x2 


and from this, by Theorem 6, the convergence (which is absolute) 
of the integral 


t- SIN. X 7, 
Exe 

If, in Theorem 5, we take for g(x) various positive functions 
whose integrals are known to converge, the theorem will im- 
mediately yield a whole series of convergence tests for integrals 
with positive integrands. Then, with the help of Theorem 6, we 
shall obtain convergence tests for integrands of arbitrary sign. 
Some of these tests are undoubtedly known to you and we shall 
not dwell on them. Instead, we shall examine two criteria less 
widely known, but more delicate, as they concern integrals which 
may not be absolutely convergent (from which it follows that these 
criteria cannot be derived from the first theorem above). 


Criterion I. Let p(x) be a monotonic function such that 
lim g(x) = 0, and let the integral 


Pox 


f fu) du = F(x) 
remain bounded as x — o. Then, the integral 
{eo S09 dx (13) 
converges. 
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Proof. By the second mean value theorem, for a< by < bo 
< +00 there exists a point 8 within the interval [b,, b2] such that 


be B be 
Jp, PCD SC) de = o(br + 0) fF dx + (br — Off) ax 


= (bi + OF (B) — F(b1)] + p(b2 — 0)[F (62) — F(A)I- 


Since for bj > +00 and by > +00 the quantities m(b,; + 0) and 
p(b2 — 0) tend, by hypothesis, to zero, while the values of the 
other factors remain bounded; the right side, and consequently the 
left side also, of the last equality tends to zero. It follows 
from Cauchy’s condition that the integral (13) converges. 


EXAMPLE |. The integral 
f ** sin X Ay (14) 


1 x 
is convergent. This follows directly from Criterion 1 if we substitute 
p(x) = x7}, f(x) = sinx and F(x) =cos 1 — cos x. 


It is easy to show, however, that the convergence of the inte- 
gral (14) is not absolute. 


Criterion 2. If p(x) is monotonic and bounded, then the conver- 
gence of the integral (12) implies the convergence of the integral (13). 


Proof. To prove this, we apply once more the second mean 
value theorem. In the first equality of Criterion 1, the quanti- 
ties (5; + 0) and g(b2 — 0) remain bounded as b; > +00 and 
bz — +00, while both integrals tend to zero by virtue of Cauchy’s 
condition and the assumed convergence of the integral (12). Thus, 
the left side also tends to zero, from which, by Cauchy’s condition 
again, we conclude that the integral (13) converges. 


EXAMPLE 2. Setting 
[m= Sm and g(x) = arc tan x, 
x 
we conclude from the proven convergence of the integral (14) that 
the integral 
(e sin x arc tan x ay 
1 


x 
is also convergent. 
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47. DOUBLE INTEGRALS 


‘The special process of summation underlying the integral cal- 
culus can also be applied successfully to functions of several vari- 
ables. In many fields of application, particularly in mechanics and 
physics, such multidimensional integrals or, as they are more often 
called, multiple integrals play an important role. The theory of 
these integrals, although it contains almost nothing new in prin- 
ciple as compared with the theory of the ordinary integral, is very 
cumbersome in its formal aspect. In what follows, limiting our- 
selves to the two-dimensional case, we shall briefly demonstrate 
how the ideas presented at the beginning of this lecture can be ap- 
plied, in practically unchanged form, to the definition of multiple 
integrals and the establishment of a number of their properties. 

Let z = f(x, y) be a bounded function of two variables defined 
on a bounded closed! region D in the xy coordinate plane. Let M 
and m denote, respectively, the upper and lower bounds of the 
function f(x, y) in the region D, whose area we agree to denote by 
|D|. As in the one-dimensional case, here also we shall consider 
various partitions T of the region D. Of course, we have here 
a much more complicated situation than before. In the one-dimen- 
sional case, both the basic region and the parts into which it was 
subdivided had one form: they were line segments. In the present 
case, the region D as well as the parts Ay, Ay,..., A, into which we 
subdivide it may have a great variety of shapes. 

For our theory, it is desirable to impose as few restrictions as 
possible on the shape of these regions. All that is required is that 
each of them have a definite area and that the area of the common 
part (that is, the region of overlap) of any two different subregions 
A; and A, be equal to zero. Beyond this, we shall not impose, for 
the time being, any special conditions on the partition T- 

As in the one-dimensional case, let M, and m, designate, 
respectively, upper and lower bounds of f(x, y) in the subregion 
A;. Let us set 


S;r= > My| Ae | and ST = S mx Axl, 
k=1 k=1 


where |A;| denotes here the area of Ay. As before, we have 


m|D| <sp<Sp<M|D 


. 





1 That is, the boundary points are included in the region. 
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so that the sums Sz and sp are bounded above and below. Let 
Tand J denote, respectively, the g.l.b. of all sums Sp and the |.u.b. 
of all sums s7. Here also we shall call these numbers the upper 
and lower integrals, respectively, of f(x, y) over D. 

If 7 = J, then f(x, y) is said to be integrable over D and its 
integral 1s 


7 a =f {fay dxdy. 
D 


Underlying the theory of such double integrals is a series of 
theorems exactly analogous to those we have already established 
for the one-dimensional case. We shall now briefly consider them, 
dwelling on the proofs only when they differ from the proofs given 
earlier. 

For lack of a better term, let us agree to use the name cells for 
the subregions Aj, Ag,..., A, into which we subdivide the basic re- 
gion D. We shall understand by the diameter of a given cell 
the least upper bound of the distances between all possible pairs of 
points belonging to that cell. For a given partition 7, we denote by 
dr the greatest of the diameters of the cells Aj, Ag,...,A,. It 
is clear that this quantity must play the same role here as the 
quantity /p in the one-dimensional case. Finally, we shall say that 
the partition 7’ is a refinement of the partition 7 if every cell of 7’ 
is entirely contained in one of the cells of 7. 

We shall now establish four lemmas corresponding exactly 
to the four lemmas which we established in the one-dimensional 
case (Section 41). 


Lemma I’. If the partition T’ is a refinement of the partition T, 
then 


Sp <Sp and sp > Sp. 
The proof follows that for the one-dimensional case verbatim. 
LEMMA 2’. For any two partitions T,; and T2 we have 
Sr, > ST». 


The proof which we gave in the one-dimensional case applies 
here also. Some explanation is needed, however, in regard to con- 
structing a partition T which will serve, simultaneously, as a refine- 
ment of each of the two given partitions 7; and 72. We simply 
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take as a cell of the partition 7, any set consisting of all points be- 
longing simultaneously to one of the cells of the partition 7; and 
to one of the cells of the partition T2. Combining all such possible 
pairs, we obtain all the cells of the partition 7. From the very defi- 
nition of a refinement, it is obvious that T is simultaneously 
a refinement of 7, and 7». 


Lemma 3’. T > J. 


This proposition, as in the one-dimensional case, is a direct 
corollary of Lemma 2. 


Lemma 4’. For any € > 0, there exists a 6 > 0 such that, for any 
partition T satisfying the condition dr < 6, we have 


Sp<T+e and sp>l—e. 


In short, as dp > 0 the sums S7 tend to their lower bound and the 
sums sr tend to their upper bound. 

The proof of this lemma does not differ in principle from that 
given in the one-dimensional case. Here, however, in view of the 
more complicated structure of the cells, a detailed presentation is 
necessary to make the proof formally irreproachable. 


Proof. First of all, since J was defined as the greatest lower 
bound of all the sums Sz, there exists a partition To for which 
Sr <I+ = Let A; designate any cell of this partition To. The set 
of all points of the plane whose distance from the perimeter of this 
cell does not exceed 6 forms a kind of band along that perimeter. 
(In Fig. 30 the boundaries of this band are marked by dotted 
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lines.) It is not difficult to see that, for sufficiently small 8, the area 
of such a band is equal to 26/,, where /; is the length of the bound- 
ary of the cell.1 Let us set 


Then the area of the union D, of all the bands of the type 
described will not be greater than 25L. Now let T be any partition 
for which dp < 6. We shall divide the cells of this partition into 
two groups, including in the first group each cell wholly contained 
in Dj, and in the second, all the remaining cells. We denote by 
Sr and S;“) the parts of the sum Sy which correspond to these 
two groups. We shall now estimate each of these sums separately. 
And, just as in the one-dimensional case, without restricting 
the generality of the argument we may assume that f(x, y) > 0 in 
the whole region D. 

Since the first factors in the terms of the sum S7™ do not ex- 
ceed M, and the sum of the areas of the cells included in the sum 
S7) does not exceed the area of D, (which, as we have seen, is not 
greater than 26L), we have 


S77) < 2MLE6. (15) 


As for the cells belonging to the sum S7), each of them is con- 
tained in one of the cells A; of the partition 7». For any cell A of 
the second group which did not satisfy this condition would have 
to contain a point P of the boundary of one of the cells A,. Being 
a cell of the second group, A cannot be entirely contained in 
the band surrounding this boundary; hence it must contain a point 
Q outside that band. But then the distance between the points P 
and Q, and hence the diameter of the cell A, will exceed 6, which 
is impossible. 

We shall not decrease the sum $7“ if, in each of its terms 
M_,|4,| (corresponding to the cell A, of the partition 7), we replace 
the first factor M, by the quantity M, (corresponding to the cell 
A, of the partition 7> containing the cell A,). On the other hand, 


1We are now tacitly imposing on the cells of our partitions more restrictive conditions 
than before. For example, we assume that each cell is a simply connected region 
whose perimeter is of finite length. Unfortunately, within the limits of these lectures 
we have no opportunity to discuss these problems in greater detail. 
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since the function f(x, y) is nonnegative, the sum $7" in its altered 
form will still not exceed S'7,. In this way, we obtain the inequality 


SAD < Sp < T+ - 


Combining this inequality with the inequality (15), and taking 


6 to be less than we obtain 





€ 
4ML’ 
Sp = Sp) + SoD < 2MLE4 14 a OT ee 
In precisely the same way, we can prove that if dp < 6 then 
sp > I — e, and this will complete the proof of Lemma 4. 


THEOREM 7. If f(x, y) is integrable over the region D, then, for 
all partitions T in which the maximum cell diameter dr is sufficiently 
small, we have 


D SE mm) |Ae| — J] <e, 
k=1 
where € is any given positive number and (&, nx) (kK = 1, 2,...,n) is 
an arbitrary point of the cell Ax. 
In short, the sum DIC nx)|4x| tends to the integral J as 


dr — 0 for all possible’ choices of the partitron T as well as of the 
points (&, Nx). 


Proof. For brevity, let us denote this sum simply by =r. Then, 
from the obvious inequalities 


me < f (Sk, mK) < Mi, 
we obtain for any partition and any choice of the points (&, nx), 
SpSlr< Sp. 
Hence, by Lemma 4, we have for sufficiently small dz, 


T—e< Ir<JI+e, 
Or 
eee Dee 


The converse of this theorem is also true, the proof being the 
same as for the one-dimensional case. 


168 


THEOREM 8. Just as in the one-dimensional CASE, 


Sr — Sp= S) wx Ap 0 as dr—0 


k=1 
is a necessary and sufficient condition for the existence of the integral. 


Proof. (i) If this condition is satisfied, then, by virtue of the in- 


equality Sp > ] > 1 > sr (for any partition), we have 
T-1<Sp—sp<e 


for sufficiently small dr, whence J = J. since e is arbitrary. 
(ii) If the function f(x, y) is integrable, then, by Lemma 4 for 
sufficiently small dr, we have 


Srp<l+e and sp>I/-—e, 
whence 
Sr — Spr < 2e, 


which means that Sp — sp > 0 as dp > 0. 

Finally, we prove the integrability of continuous functions by 
precisely the same reasoning as in the one-dimensional case. The 
basis of the proof is that a continuous function is uniformly con- 
tinuous in a bounded closed region (a bounded region including its 
boundary points). 


48. EVALUATION OF DOUBLE INTEGRALS 


In discussing ordinary integrals earlier, we mentioned that the 
summation process, though fundamental in the definition of an in- 
tegral, is almost useless for the actual computation of integrals. It 
is even less to be expected that double integrals could be computed 
readily by the process of two-dimensional summation described 
above. We have seen that powerful and general methods for com- 
puting one-dimensional integrals are obtained only by exploiting 
the connection between the integral and the differential calculus. 
Such relations may also be discovered for double integrals (and for 
multiple integrals in general). However, the most general and eflec- 
tive method for computing double integrals is obtained by the 
familiar process of reducing the problem to two successive one- 
dimensional integrations. Let us consider briefly how this can be 
done. 
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Let us assume that the region D, on which the continuous func- 
tion f(x, y) is defined, is such that each line parallel to one of the 
coordinate axes intersects the perimeter in not more than two 
points (Fig. 31). The partitions 7, in terms of which the integral J 


Y = $2(x) 


O 





of f(x, y) is defined, may have cells of any shape, providing only 
that their diameters become arbitrarily small. We shall now 
construct a partition by using a network of lines parallel to the co- 
ordinate axes, so that the cells will be rectangles (except those lying 
along the perimeter of D). We shall denote by a and b the greatest 
lower and the least upper bounds, respectively, of the abscissas of 
the set of all points in D. Leta = £) < 2 <---< & = b denote 
the abscissas of the partition lines parallel to the y-axis and 
m1 <2 <--- the ordinates of the lines parallel to the x-axis. 
Finally, we set & — &-1 = hj and nj — nj-1 = kj. 

Each cell of our partition either fills a rectangle 1 <x < &, 
nj-1 <y <n; completely, in which case we denote the cell Ajj. or is 
contained in such a rectangle but does not fill it. We shall denote cells 
of the second type by Aj, As,.... We shall also use |A,;| and |4,| 
to denote the areas of the corresponding cells. We now choose in 
each cell A,, a point (X,, ¥,), and form the sum 


S=> SE, Au] + S SX ¥)|Ar| 
= sw 4 Sap, . 


Let 6 denote the largest of the numbers h; and k;. Then, for suffi- 
ciently small 6, the sum S differs by arbitrarily little from the inte- 
gral I of f(x, y) over D. 
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Let us now take a closer look at the troublesome rectangles 
which contain both points in D and points not in D. The strip in- 
cluded between the lines x = &_, and x = & may contain such 
rectangles at the top and at the bottom. In Figure 32, we have 


y 


Fig. 32 













Wj 


schematically represented the upper end of such a strip. These rec- 
tangles (we have three of them in Fig. 32) are contained in a rec- 
tangle whose base is equal to A; and whose altitude clearly does not 
exceed w; + 26, where w; is the oscillation of the function q(x) in 
the interval [&_1, &;]. If 6 is sufficiently small, then, because of the 
uniform continuity of the function’ ge(x), all ;, and hence all 
w; + 26 also, will be less than an arbitrarily small positive number 
u. Thus, the sum of the areas of the nonrectangular cells repre- 
sented in Figure 32 will be less than p/,;, and, consequently, 
the sum of the areas of all the cells adjacent to the upper half of 


the perimeter of region D will be less than p > h; = w(b — a). We 


clearly obtain a similar estimate for the area of the cells adjacent 
to the lower half of the perimeter, so that taken all together, 
the incomplete cells account for an area not exceeding 2u(b — a). 
Hence, S@D) does not exceed in absolute value 2nM(b — a), where 
M designates the upper bound of |/f(x, y)| in the region D. Thus, 
for 6 sufficiently small, S“ differs arbitrarily little from S. 

We now construct the sum 


S= > SG a) hik; 
= i> SE. mK 


where the summation is over all 7 and j such that (&, 7) is in 
D. The terms of S consist of the terms of S® plus terms corre- 
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sponding to rectangles 1 <x < &, nj-1 <y < n; containing the 
point (&, nj) of D, but also containing points not in D. It follows 
from the argument above that, for sufficiently small 6, S differs 
arbitrarily little from S®, and, hence, differs arbitrarily little from 
the integral J of f(x, y) over the region D. 

If, while keeping the values é1, £5, ... fixed, we decrease indefi- 
nitely all the differences k; = nj — n;-1, then the inner sum 


> f(&, 14) kj will clearly have as its limit the integral 
j 


fore y) dy. 


1(E,) 


For brevity, we shall denote this integral by F(&), agreeing to 
write in general 


2(z) 


1(z) LO y) dy. 


(While integrating, we consider x as a constant, so that we 
integrate the function f(x, y) as a function of one variable y.) We 
can take the quantities k; so small that, for each i, we shall 
have the inequality 


F(x) = [ 


g 


| 2 SEs nj) kj — F(&)| <E 


from which 





Jai SSG my ky — FED | < eh 


and 


Dh SE mM -— S PE) 





=|$-3 F@)h 
<e Dd hi = eb — a). 





Finally, for sufficiently small h;, the sum > F(&) hy clearly differs 
arbitrarily little from the integral : 


[PF dx. 
We see then that for a suitably chosen partition of D, all three 
differences 1 — §, § — S F(é;) hi, and S. F(é) hy — { ” F(x) dx 
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become arbitrarily small; hence, this is also true for their sum 
b : 
I -f{ F(x) dx. This latter sum, being entirely independent of the 


choice of partitions, must therefore equal zero. Thus, we obtain 


b p2(z) 
J J fly) dxdy = Pde f° fey) ay, 
so that the computation of a double integral is actually reduced to 
two successive ordinary integrations. The value of the inner inte- 
gral depends, of course, upon x, but not on y. 

Instead of the order of integration we have used, we could also 
have chosen the reverse order. This fact sometimes has great prac- 
tical importance, since by changing the order of integration we 
may obtain functions which are much easier to integrate. 


49. THE GENERAL OPERATION OF INTEGRATION 


Our lecture has been a long one. Nevertheless, before conclud- 
ing it, we ought to take a look at the general relationship between 
differentiation and integration. This will throw additional light on 
the underlying nature of every process of integration, regardless of 
whether it involves an ordinary integral, a multiple integral, or an 
even more general and abstract integral. 

First of all, we are dealing with a given region D located in 
some space, the term space being understood here in a very broad 
sense. The space may be a Straight line, or a plane, or our 
usual three-dimensional space, or it may be any multidimensional 
space, or even some completely different kind of space. We do not 
intend here to give a general definition of this term. For our pur- 
pose, it is important only that, first, the distance between any two 
points be defined and, second, the region D and those of its parts 
(cells) with which we shall be concerned have extent. Depending 
upon the type of space, this extent is usually called length, 
area, volume, and so on. In general, we shall call it simply the 
measure of the given part of our space. 

Now let us imagine a substance distributed over the region D. 
Don’t let the word substance frighten you; we shall not attempt to 
define the concept expressed by it. This substance is for us simply 
something of which a definite portion is contained in each part of D. 
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It may be mass, electrical charge, heat, or any other measurable 
quantity which can be distributed in various ways over D. It may 
be, for instance, the quantity of precipitation falling on a plane re- 
gion D during a given period of time. It is evident that, for the 
construction of our mathematical picture, the nature of this sub- 
stance is absolutely irrelevant. We require only that a definite 
quantity F(A) of this substance be attributable to each of those 
parts A of the region D with which we are concerned. The 
total amount of the substance F(D) is the sum of the sub- 
stance within the individual cells: 


F(D) = F(i) + F(A2) +--+ + F(An). 


In all concrete interpretations of the formal scheme we have out- 
lined, an essential role is played by the concept of the density 
of the substance at a given point P. This density, from the point of 
view of mathematics, can be defined, as we shall now see, by a 
process of differentiation. 

If A is any part of D having the measure m(A), then it is natural 
F(A) 
m(A) 
the subregion A; it is the amount of substance in A per unit of ex- 
tent (measure). Now let P be any point in D. If we surround this 
point by a subregion A having a small diameter d(A) (the diameter 
of any region in our space is completely defined, since the distance 
between any two points is defined), then the average density of the 
substance in this small region will characterize the density of the 
substance in the immediate neighborhood of P. And, if this aver- 
age density tends, as we shall assume, to a limit f(P) as A 
contracts to the point P (that is, as d(A)—> 0), then we call this 
limit the density of the substance at the point P. It is in this way that 
we define the density of all physical substances (such as mass, elec- 
tricity, etc.). We see that the density f(P) of the substance is a 
function of the point P, while the quantity F(A) is a function of the 
set A. We can say that the function F(A) gives a global (region-re- 
lated), and the function f(P) a local (point-related), characteristic 
of the distribution of our substance in the region D. If the function 
F (A) is given, then we obtain /(P) from it by some differentiation 
process. The essence of the differentiation process can perhaps be 
described most conveniently as the passage from the global descrip- 
tion of a phenomenon to its local characterization. 





to call the ratio the average density of our substance in 
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But let us now suppose, conversely, that at each point P of the 
region D the density f(P) of the substance is given, and we are to 
determine the total amount F(D) of the substance contained 
in this region. This is the problem of integration in its most gen- 
eral form, the passage from the local to the global characterization of 
the phenomenon in question. Let us assume, for simplicity, that the 
function f(P) is uniformly continuous on D. To solve our problem, 
we subdivide D into cells Ay, Ao,..., A, of small diameter,! and in 
each cell A; we choose an arbitrary point P,. The quantity f(Px) is 
F(A) 
m(A) 
point P decreases indefinitely in diameter. Therefore, if the cells A; 
are very small, we have 


the limit of the ratio 





as the subregion A, containing the 


F (Ax) 


LOO ~ HG) 


<6 


where e is any positive number specified in advance. Thus, 


F (Ax) — em(Ax) <f(Px)m(Ax) < F(Ax) + em(Ax). 
Adding these inequalities with respect to k, we find 


F(D) — em(D) < >) f(Px)m(Ax) < F(D) + em(D), 
k=1 
and, as the cell diameters decrease indefinitely, we have in the limit, 


lim > f(P)m(A,) = F(D). 


We see then that given the function f(P) (the density of 
the substance at each point), we can actually determine the quan- 
tity F(D) (the total amount of the substance) by the already famil- 
iar process of integration: dividing the region D into small cells Ax, 
selecting the points Px, summing the products of the form 
S(Px)m(Ax) over all the cells and, finally, passing to the limit under 
the condition that the diameters of the cells tend to zero. 


1 We are restricting ourselves to the case where this can be done: for example, when D 
is a bounded region in Euclidean space. Here the author is only interested in the gen- 
eral picture and, consequently, makes no attempt to establish a detailed theory with 
precise hypotheses on D and /(P). The reader who wishes to go further into the sub- 
ject may consult such texts as S. Saks, Theory of the Integral (Warsaw Mathematical 
Monographs, Vol. 7), translated by L. C. Young with notes by S. Banach, 2d rev. ed. 
(New York: Hafner Publishing Company, 1937.) 
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Thus, from the very general point of view which we have 
adopted, the process of integration in arbitrary spaces appears to 
be a method allowing us to determine the amount of substance 
contained in any region when we know the density of distribution 
of this substance at each particular point. Every procedure of this 
type, unfailingly, has as its correlate some differentiation process 
(solving the inverse problem), whose goal is to find the local 
density of the distributed substance when the amount contained in 
each subregion is known. 

It is entirely possible, with a sufficiently exact formulation of 
the fundamental assumptions, to construct a theory of integration 
on this general, abstract base. Ordinary, double, and other special 
integrals then become particular cases of such a general theory, 
their most important properties being established once and for all 
by this general theory, so that they do not require separate proofs 
in each particular case. 
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7. Expansion of Functions in Series 


50. USE OF SERIES IN THE STUDY OF FUNCTIONS 


From the original geometric definitions of the sine and cosine, 
we can at once deduce a number of very important properties of 
these functions. The introduction of the analytic expressions sin x 
and cos x adds nothing new to our knowledge of these functions, 
just as we learn nothing new about the properties of the Dirichlet 
function when we denote it by f(x). When we subsequently obtain, 
however, new analytic expressions for the trigonometric functions 
in the form of the power series 

x2n-l Eg xen 
Crepe and cosx= PS. arene 
we easily discover a number of properties of these functions. With- 
out analytic expressions, these properties could only be derived 
from the original definitions by the most complicated reasoning (if 
at all). And above all, these expansions make it possible for us to 
compute in a simple manner the values of the trigonometric func- 
tions for all values of the argument. 

In Lecture 3, we pointed out the danger in the excessive wor- 
ship of analytic expressions. And among the representatives of the 
applied sciences, we often encounter the harmful consequences of 
this phenomenon. We cannot deny, however, that among modern 
mathematicians we occasionally find an attitude at the opposite ex- 
treme. This deliberate unconcern with analytical expressions, and 
a concomitant helplessness in dealing with them, may be no 
less harmful. If, however, we can approach analytic expressions with 
a complete mastery, while simultaneously bearing in mind that they 
are only an instrument for the study of functions, then they can 
play a role of decisive importance in this study. It is perfectly 
natural for a mathematician investigating a given function to begin 
by trying to find an appropriate analytical expression for it. By 
means of such an apparatus, he will, in most cases, obtain in the 
most efficient manner interesting and important properties of the 


function. 


sin x = S (—1)771 
n=l 
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Due to their simplicity, flexibility, clearness, and convenience in 
application, series expansions unquestionably take first place 
among the various analytical devices capable of serving as instru- 
ments in the study of functions. The idea of this extremely impor- 
tant analytical apparatus is very simple: the function to be investi- 
gated is represented as the limit of a sequence of other functions 
(the partial sums of the representing series) which are simpler and 
more accessible to study. If such a partial sum approximates 
the function closely in the entire region under consideration, it is 
reasonable to expect that from the properties of this partial sum 
we can learn, if only approximately, some properties of the func- 
tion itself. In particular, if we know how to compute approximately 
the values of these partial sums for various values of the argument, 
we have a method for approximating the corresponding values of 
the functions. 

But what functions will most conveniently and usefully serve as 
elements of the expansion, that is, serve as the terms of the series 
which is to represent the given function and help us in its study? 
To this question (as it is natural to expect) there is no unique and 
universally applicable answer; here almost everything depends on 
the nature of the function and the character of the problem before 
us. We must, however, note that there are a few types of series of 
such demonstrated merit that they are employed very frequently, 
and this has naturally led to their extensive development. In 
the first rank among such series belong power series (in which the 
elements of the expansion are integral, primarily nonnegative 
powers of the independent variable) and trigonometric series (with 
elements of the form sin kx and cos kx, where k = 0,1, 2,...). In 
many cases, however, it is convenient to choose as elements of the 
expansion not these simplest, universal functions, but totally differ- 
ent functions. These functions, though not so simple, are by their 
properties more closely related to the function under study (for ex- 
ample, the so-called proper functions in boundary value problems). 
In general, the guiding principle in selecting the type of expansion 
to be used should be the absence of any bias; the specific charac- 
ter of the problem before us should be taken into account in each 
and every instance. 

In what follows, we shall touch briefly only the most important 
questions related to the expansion of functions in power series and 
trigonometric series. 
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51. EXPANSION IN POWER SERIES 


We know that the domain of convergence of a power series is 
an (open, closed, or half-closed) interval with end points —r and r. 
What properties must a function f(x) have to be expressible in a 
power series 


I) = 


convergent in this interval? We know that it is necessary that f(x) 
be continuous in the open interval (—r. r) (Lecture 4, p. 98), but 
this is far from sufficient. 


aad, 


i 


THEOREM 1. If f(x) is to be expressed as a power series conver- 
gent in (—r, r), it must have a derivative f’(x) at each point x of 
(—r, r). Moreover, this f’(x) can be represented by a power series 


SD nayx™}, (1) 
n=1 
convergent in the interval (—r, r) and obtained by differentiating the 
given series term by term. 


In proving this, we must keep in mind that neither the existence 
of the derivative f’(x), nor the convergence of the series (1) are 
given beforehand, so that both of these facts will have to be estab- 
lished in the course of our argument. 


Proof. Let |x| <r and let p be any number between |x| andr. 
For |A| <p — |x|, we have |x + h| <p <r, and, consequently, 


fx 4h) = Sax + 


r= 


From this, 
fe+h~ feo - > a, ETOP A = Seth) + Roh) 
where 
Sy(h) = Sa a, Xt = and Reh) = % es 
n=0 
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Since 


a = I(x + Aydt (xe + Aye x fee + xP | 
x ad 
a op” — |x|” po” 
Dest n—2| x eae x{2-1 = 2 eee, 
phi phe elite | —— —— 


we have, for |h| <p — |x|, 
[Rv()| <—— >> |an|o” 
p — |x| n=N41 


But p <r, so the sum on the right side of this inequality, being the 
remainder of a convergent series, becomes smaller than an arbi- 
trarily small positive number e for sufficiently large N. Thus, if N is 
sufficiently large, and |h| < p — |x|, we have 


|Ry(A)| <e. 
Hence, 


“Sw(h) — ¢ < LEB ASO < such) +e 


If now N remains fixed, but A is made to tend to zero, the sum 
Sy(/A) will have as its limit the sum 


N 
Sa Snes. 
n=1 


The last inequality allows us, therefore, to assert that, as h — 0, the 
upper and lower limits of the quantity 


I(x + h) = f) 
h 


will differ from Sy by no more than e, and hence will differ from 
each other by no more than 2e. Since e is arbitrary, it follows that 
these limits coincide, that is, that the limit 


exists. Moreover, we have 
If’) — Sv| <e 


under the sole condition that N is sufficiently large. But this means 
that the series (1) converges, and that its sum is f’(x). Thus, we 
have proved all our assertions. 
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We have shown that a function represented by a power series 
must be not only continuous but also differentiable. And, since 
f(x) can be represented by a power series converging for all 
|x| <r, by virtue of the very theorem we have just proved, it fol- 
lows that the second derivative f’(x) also exists. Repeating this 
reasoning we come to the following conclusion. 


THEOREM 2. A function which can be expressed in an interval by 
a power series must have, at every interior point of that interval, de- 
rivatives of all orders. Moreover, each of these derivatives may be 
represented in this (open) interval by a power series obtained by dif- 
ferentiating the given series term by term a corresponding number of 
times. 


Thus, 
I) = > anx”, f'(x) = > nayx""}, 
n=0 


no=1 


and, in general, 


f®(X) = s n(n — 1)---(n — k + Vagx™*. 


nak 


Substituting in this formula x = 0, we find 
f®(0) = ka, 


SROs, ati.) 


whence, 


Thus, we have simultaneously proved the uniqueness of the expan- 
sion of a function into a power series and found the formulas ex- 
pressing the coefficients of this expansion in terms of the derivatives 
of the given function at x = 0. Therefore, if a function f(x) can be 
expanded in a power series, this expansion must have the form 


(2 >) LO (2) 


This is the so-called Maclaurin series. Substituting a + / for x and 
denoting f(a + A) by ¢(A), it may be shown that ¢() is capable of 
expression in a series of powers of h, convergent for |A| <r-— |a], 
where (—r, r) is the interval of convergence of (2). We thus have 


Es (n)(0 
pt) = > EO im 
n=0 n. 
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From this, by returning to our original notation, we arrive at the 
more general Taylor series 
fo= > Le - ay 3) 
7 n=0 n! 

The situation becomes particularly clear when we compare this 
with what was said in Lecture 5 about the Taylor and Maclaurin 
expansions. There we were not concerned with infinite series; we 
called the quantity 

2 £0) 
Rx) = f(x) — > anes Gees 
k=0 
the remainder of the Maclaurin expansion, and we examined 
its behavior for infinitely small values of x. It is clear that the 
behavior of R,(x) decides the question whether the function 
f(x) can be expanded in a power series. In Lecture 5 we considered 
nto be constant and made x tend to zero. Now, on the contrary, 
the value of x is kept fixed and n is allowed to increase with- 


out limit. The condition 
lim R,(x) = 0 

is quite obviously necessary and sufficient for the validity of for- 
mula (2). We usually establish the possibility of expanding a given 
function in a Maclaurin series by considering one of the many 
forms in which the remainder R,(x) may be expressed, some of 
which we encountered in Lecture 5. Which of these forms is most 
convenient for this purpose depends entirely on the nature of the 
function whose expansion we seek to obtain. 

As we have seen above, a function which can be represented in 
a given interval by a power series must have, at every interior point 
of this interval, derivatives of all orders. Since, conversely, for any 
function having this property, it is formally possible to write 
the Taylor series (3) for each point a of this interval, one might be 
tempted to suppose that this condition is sufficient for developing 
a function in a power series. This, however, is not the case. First of 
all, it may happen that the Maclaurin series (2), written formally 
for the given function, diverges for all x #0. Much more interest- 
ing, however, is the fact that, even when the Maclaurin series con- 
structed for the given function converges, its sum may not equal 
the given function. A classical example of such a case is given by 

fer? fx, 

H)=[9 ite co 
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It may easily be verified that at x = 0, the function itself and 
its derivatives of all orders have the value zero, and, consequently, 
all coefficients of the Maclaurin series (and thus its sum for all 
values of x) are equal to zero. 

From this it follows that, while a given function can be devel- 
oped in a power series in no more than one way, any convergent 
power series may be regarded as the Maclaurin series not merely 
of one, but of an infinite set of functions. For, let f(x) be the sum 
of the given series. The given series is thus the Maclaurin expansion 
for not only f(x), but also every function of the form f(x) + ag(x), 
where a is any real number and q(x) is the function defined above. 
Only one function of this family, however, can be represented by 
the given series (that is, can be the sum of the series). 


52. SERIES OF POLYNOMIALS AND THE 
WEIERSTRASS THEOREM 


Since the successive partial sums of a power series are polyno- 
mials of continually increasing degree, every function f(x) devel- 
opable in a power series can be represented by an approximating 
polynomial with any specified degree of accuracy. We can express 
this more precisely as follows: If a function f(x) can be expanded 
in a power series with radius of convergence equal to r, then, for 
any positive p less than r and for any positive e, there exists 
a polynomial which differs from f(x) by less than e for all x in the 
interval [—p, p] (we cannot, however, make the same assertion con- 
cerning the interval (—r, r), since in this interval, the convergence 
of the series may be nonuniform). Let us recall that it was to this 
same problem of the approximation of functions by means of poly- 
nomials that we applied the Taylor and Maclaurin formulas in 
Lecture 5. 

But, while it is possible to approximate by polynomials any 
function capable of expansion in a power series, the converse 
is not true. That is to say, from the fact that a function f(x) ina 
given interval {a, 6] can be expressed in the form of a polynomial 
with arbitrarily specified accuracy, it does not follow that it is de- 
velopable in a power series. This remark is of great theoretical 
importance. The approximate representation of functions by 
polynomials is one of the most important tools for studying them; 
on the other hand, we know that expansion in power Series is possi- 
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ble for only a comparatively narrow class of functions, a class 
which does not even include all functions having derivatives of all 
orders (which is itself already a very restrictive requirement). 
Hence, if the possibility of approximating a function by means of 
polynomials were to depend upon its capability of expansion in a 
power series, this very valuable property would indeed be limited 
to a highly restricted class of functions. 

In reality, the case is otherwise. It turns out that a necessary 
and sufficient condition for the possibility of approximating, with any 
required accuracy, a function in a given interval by means of polyno- 
mials, is its continuity on the interval. Thus, even nowhere differ- 
entiable functions can have such an approximate representation. The 
necessity of continuity is obvious: A function which can be repre- 
sented with any desired accuracy by a polynomial is the limit of a 
uniformly convergent sequence of polynomials, and, consequently, 
it is also the sum of a uniformly convergent series of polynomials, 
that is, of continuous functions. Hence, by a theorem which 
we have already encountered in Lecture 4 (p. 91), it must itself be 
a continuous function. That this necessary condition is also suffi- 
cient is one of the most profound and important facts of mathemat- 
ical analysis and constitutes the following celebrated theorem. 


WEIERSTRASS’ THEOREM. If a function f is continuous on a closed 
interval [a, b], then given any & > 0, there exists a polynomial P,(x) 
such that 

YO) - PAX| <a axed. 


Before proving this,! we must consider some rather remote 
preliminaries. 


Let us consider the integral 
+1 
c= — y2yn 
f;, = (1 — u?)" du, 


which is clearly greater than zero for every n = 1,2,.... Let us 
divide the domain of integration into two parts, and consider sep- 
arately the integral 


By 
Ky = se “(1 — u?)" du 


eu 
—n 3 


1 At present, a considerable variety of different proofs of this theorem are known. We 
have selected here one of the most intcresting from the methodological point of view. 
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extending over the interval ( — Tr aa) (which is small for large 


n), and the complementary integral, 


La = 1 — u?)" du, 
fw) 


n 3S\ul<1 


which is actually the sum of two integrals extending over the in- 


] l : ; 
tervals (1. _ =| and (=. \), respectively. As n increases 
Ln aaa 


on 

indefinitely, the domain of integration of the integral K,, contracts 
to the point uw = 0, while the domain of integration of L, expands, 
tending to include the whole interval [—1, 1]. Nevertheless, we 
shall presently see that, for large values of n, the component K, of 
the sum J, = K, + L, exhausts almost the entire value of J,, leav- 
ing only a negligible part of this value for the integral L,. The ex- 
planation for this is that, for large n, the integrand (1 — u?)" hasa 
perceptible value only for very small values of |u|, while for some- 
what greater values of |u| it becomes negligibly small. The graph 
of this function is given in Figure 33. 
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Proof. Since each of these two relations is evidently a con- 
sequence of the other one, it is sufficient to prove either one 
of them. 


[85 


The integrand in L, clearly attains its maximum value when 


a n—3. Since the domain of integration is a part of the interval 
[—1, 1], we have , 
Ln< 201 — 1 3)". 


On the other hand, it is evident that 


al ] —2\" 
iQ) — u2)" du>n (1 Hig 3) : 
3 


Wl 


n 


n> fi 


and consequently 


n 











1 = -3 : 
| < 2n3 ] n = 
In ] — a 
But! 
a: 22 
2 —n 3 2 
a; 5 gee 
ee =I]-+ 4 Sie2naces . 
= dia hc Iya 7 
4 4 
from which it follows that 
1 
di Bh 
fe 25g 
In 
: 3 21 
or, setting qe = 2; 
La a 
Th ae 


But as n— oo we have z— o; and since, under these circum- 
stances, ze~* — 0, we have? 





nN 2 n 


which proves our lemma. 


1It is known that | + x < e* for all x +0: the simplest way to prove this is to find 
the minimum value of the function e7 — 1 ~— x. 


2 The simplest way to see this is to observe that for z > 0, we have 


22 a 
e=1+4+2+4+—-+-.-> ney and, consequently, ze-? = —_ << 


2 
2! 2 eZ 
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You may wonder what possible significance our detailed exami- 
nation of the integral 7, could have for the proof of such a general 
proposition as the Weierstrass theorem. This is by no means 
the only instance in mathematical analysis where an extremely 
general theorem is proved by applying a very special analytic tool, 
such as the integral /, in the present case. The specific property of 
this integral which makes it a convenient instrument in establish- 
ing the Weierstrass theorem finds its expression in the lemma 
which we have just proved. Our proof of the Weierstrass theorem 
can thus serve as an instructive example of the methods used 
in analysis. 


Proof of the Weierstrass theorem. Let f(x) be any continuous 
function on the interval (0, 1]. The integral 


P(x) = -) fe fO){L — @ — x)2}" dv 


is clearly a polynomial in x of degree 2n (since the integrand is 
such a polynomial). Transforming the variable of integration by 
setting v = x + u, we obtain 


P(x) = = [fe + x)(1 — u?)" du. (3’) 


LetOc<a<BP<landa<x<f;then -—x< -a<0<l1-f 
1 
3 


<1 — x. Consequently, if n is so large that m3 is less than both a 
and | — £, then : : 
—x<c —mM3cn3cl—-x (a<x < f). 


In the equality (3’), we can write 


1 
1l-z —n 3 n 3 1-r 
if = « awd as oe 
Denoting by M the maximum value of [f(x)| in the interval (0, 1], 
we clearly have 
1 
L(G SE te 0 er 


1 
Va ee eee 
< (f +f sfa- wy du = 


~ Tn -1 n 














Ly 





as n—» 0, by virtue of our lemma. Since does not de- 


n 
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pend upon x, the left side of this inequality tends to zero uniformly 
with respect to x, as n> oo in the whole interval [«, f]. Setting 





Lp 6) pct a = wtp de = Role 


we then have 
1 
n 3 
P,(x) = a. [1s +200 = wy" du + RaQ, 


where R,(x) > 0 uniformly in the interval [a, B] asn— co. 

No doubt, you have already guessed that it is the polynomial 
P,(x) that will turn out to be the desired approximation to f(x). If 
so, then you must also have divined the further course of the 
proof: considering the uniform smallness of the quantity R,(x) for 
large values of n, it is only necessary to prove that, for large values 
of n, the first term on the right side of the last equality differs little 
from f(x) in the whole interval [a, 6]. But this is almost obvious, 
since the integrand f(u + x) differs little from f(x) when the values 
of u belonging to the domain of integration are extremely small. 
So, replacing f(u + x) by f(x), we thereby replace the whole first 


n 


In 





term by the expression f(x) , which, in view of our lemma, 


tends to f(x) asn— oo. 

To complete the proof, we have only to carry through this rea- 
soning with the necessary formal rigor. For this purpose it will be 
simplest to use the first mean value theorem (Lecture 6, p. 155), by 
virtue of which 


at 
n 3 1 


[2 se + 2 = uy" du = f(x + O14) Ky, 
where —1 < @ < |. We then obtain 


U/@) = Pa] = | f) — fe + OS 


Th 





~ R,(x) | 
Kn 
Ln 


Let e > 0 be an arbitrarily small number. As we already know, for 
sufficiently large n, we have 


IRi@l<z  @<x< B). 





<|f0) — fo + or) A2] + (RI, 
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: K, : 
On the other hand, since 7. — 1 as no, while the function 
n 


f(x) is uniformly continuous in the whole interval [0, 1], it follows 
that, for sufficiently large n, we have 


Kn 


fo — f(% + 6-3) ; 





ao (@exep. 
Thus, for sufficiently large n, we obtain 
I) — P| <f+ise @K<x< Pp) 


In other words, as n> oo the polynomial P,,(x) tends to f(x) uni- 
formly in [a, £]. 

The proof of the Weierstrass theorem is now essentially com- 
plete; we have only to free ourselves of some minor restrictions. 
First, we have to pass from the particular interval [0, 1] to any in- 
terval [a, b]. Second, we have to show that the uniform approxima- 
tion required by the Weierstrass theorem takes place, not only in 
any interval [a’, b’] lying entirely in the interior of the interval 
[a, b] (only this has been shown so far, since the interval [a, 6] was 
an interval lying entirely in the interior of [0, 1]), but also in 
the whole interval [a, 5]. 

To achieve the first of these two goals, let us set 


a 
b—a 





=y,x =a+(b— apy (a<x<)), 
and 
f(x) = fla + (6 — Oy] = 9). 


As x increases from a to b, y increases from 0 to 1; and, since f(x) 
is continuous on [a, b], p(y) is continuous on [0, I]. Leta <a’ <b’ 
< b, and let us set 


dete end: BES OS eS, 
b-—a b—a 





By what we have just proved, we can find, for any positive e, 
a polynomial P,(y) such that 


lov) — Pay)| <e (a<y<B), 


fe) - p, (=<) <e (a <x<b’/. 
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But it is clear that 





Mae) = Pal =F) 
is a polynomial in x of the same degree as P,(x). Thus, a function 
f(x), continuous in any interval (a, b], can be uniformly approxi- 
mated by polynomials, with any required accuracy, in every inter- 
val [a’, b’] contained entirely in the interior of the interval (a, 5). 
Finally, to remove the last restriction, let us assume again that 
f(x) is continuous on [a, ], and let us set 


f(a ifa-l<x<a, 
re=| 76) tea x aD, 
f@®) foexeb+l. 


It is obvious that F(x) is defined and continuous on [a — 1, b + 1], 
which contains in its interior the whole interval [a, b]. By what we 
have proved, F(x) can be uniformly approximated by polynomials 
on the interval [a, b], with any required accuracy. Since F(x) = 
f(x) in this interval, the same thing is true for the function 
f(x), and the proof of the Weierstrass theorem is now complete. 


The Weierstrass theorem can also be formulated as follows: 
every continuous function is the sum of a uniformly convergent series 
of polynomials. For, if P,»(x) denotes a polynomial differing from 


F(x) by less than t in the interval [a, b], then the series 


Py(x) + {Po(x) — Pi(x)} +--+ + (Pax) — Paa(x)} +--+ 


whose terms are polynomials, tends to f(x) uniformly on the 
interval [a, 5]. . 


53. TRIGONOMETRIC SERIES 
A trigonometric series is a series of the form 


ao 


> + S (an cos nx + by sin nx). (4) 
nol 


Since all the terms of this series are periodic functions with periods 
which are divisors of 27, the sum of this series will have a 
period 27. Therefore, if a function which we wish to expand into 
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such a series does not have this periodicity, its expansion in a 
series of type (4) can be achieved only for intervals whose length 
does not exceed 27. And, because of the periodicity, it is sufficient 
to limit our study of the series (4) to some fixed interval of length 
2m, for example, to the interval [—z, a]. 

We shall now present three reasons why the expansion of a 
function in a series of the form (4) may, in some cases, be more 
appropriate than its expansion in a power series. 

1. As we know, expansion in power series is possible only for 
those functions which have derivatives of all orders (and even this 
restrictive assumption is, in general, not sufficient). But, as we shall 
see later, for the development of a function in a series of the form 
(4) much more modest assumptions will suffice. In particular, such 
an expansion exists for every function which has a bounded 
and integrable derivative, and even this assumption is dispensable. 

2. The terms of the series (4) are periodic, wave-like functions, 
and a certain degree of this wave-like form is retained by the par- 
tial sums of this series, a characteristic which power series, in gen- 
eral, do not possess. Thus, if the function under consideration 
shows some tendency even to approximate wave-like form (as often 
happens in mechanics, physics, biology, and economics), we are 
Justified in expecting that the partial sums of a trigonometric series 
will imitate the behavior of such a function better than the poly- 
nomials which form the partial sums of a power series. 

3. Finally, the trigonometric functions which constitute the 
terms of the series (4) have one remarkable property, which greatly 
facilitates both the study of and the operations with trigonometric 
series, and which is completely lacking in the terms of power series. 
This property is the so-called orthogonality of the system of functions 


1, sinnx, and cosnx (ean ee (5) 


over any interval of length 27. Two different functions are said to 
be orthogonal over an interval if the integral of their product over 
that interval is equal to zero. You can easily prove that this prop- 
erty applies to the functions in (5) over any interval of length 27 
by transforming the integrand into a sum by means of elementary 
trigonometric identities. 

It would be difficult to overestimate the importance of the 
property of orthogonality. Functions much more complicated than 
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those in (5) may, if only they form an orthogonal system, serve as 
the terms of series which turn out to be extremely useful in the in- 
vestigation of functions. Numerous properties of the trigonometric 
system (5) are shared by all orthogonal systems, a fact which has 
led to the construction of an entire theory of such systems. Today 
this theory is highly developed and includes many important and 
profound analytical results. 


54. FOURIER COEFFICIENTS 


In the case of power series, we have seen that the coefficients of 
the expansion of a given function can be determined easily if we 
know the values of the function and its derivatives at x = 0. It is 
natural to raise the same question with regard to trigonometric 
series. 

Suppose that f(x) is represented in the interval [—7, 7] by 
a uniformly convergent series (4). Multiplying both sides of 
the equality 


f(x) = > + Ss (a, cos nx + by, sin nx) 
n=1 


by cos kx, where k is zero or a positive integer, we clearly obtain 
the expansion of f(x) cos kx in a series which is uniformly conver- 
gent. Therefore, in integrating both sides of this new equality from 
—m to +7 we may integrate the series on the right side term 
by term (see Lecture 4, p. 94). By virtue of the orthogonality of the 
system (5), the integrals of all the terms will vanish except for the 
one integral 


{- ax Cos? kx dx = ay fi ee Ee ax = ak. (6) 


We have assumed here that k > 0; for k = 0 the integral differing 
from zero will be 


Toa 
{ sax = 7a, 


so that the general result (6) remains valid in this case also. (It is 
with just this end in mind that the first term of the series (4) 
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is usually written in the form 3) Thus, for any k > 0 we have 


i (x) cos kx dx = ma, 


OF 
a= l - C 
c= {fe coskxdx  (k=0,1,2,...). (7) 
In a similar manner, we find 
a + i fx) sinkxdx = (k = 1,2,...). (8) 


Formulas (7) and (8) solve our stated problem for the case 
when the series (4) is uniformly convergent; they express the co- 
efficients of the series by means of the function which this 
series represents. As you can see, these formulas in themselves do 
not even require the existence of the first derivative of f(x). The 
coefficients a, and bx, which can thus be constructed for any inte- 
grable function, are usually called the Fourier coefficients of the 
given function, and the series (4) constructed with their help 
is called the Fourier series of this function (even though formulas 
(7) and (8) were first discovered by Euler). Needless to say, it does 
not follow from this (the logical situation here being completely 
analogous to that encountered in our study of power series) that the 
Fourier series constructed for an integrable function must converge. 
In fact, any number of cases exist where the Fourier series con- 
structed for an integrable function is divergent. And even if it does 
converge, it does not follow that its sum must coincide with f(x). 

All that we have established up to this point can be formulated 
as follows: If a function f(x) can be expressed by a trigonometric 
series converging uniformly in [—1, a], then this series is the Fourier 
series of the function, that is, its coefficients are expressed in terms 
of the given function by the formulas (7) and (8). In the case 
of Maclaurin series, we came to an analogous conclusion with only 
one difference: in that case, there was no need to require explicitly 
uniform convergence, since the very nature of a power series 
implies its uniform convergence in every subinterval contained 
within the interval of convergence. 
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Obviously, the basic problem of the theory of trigonometric 
series consists in determining the conditions under which a given 
function will equal the sum of its Fourier series. This question has 
occupied the attention of a very large number of investigators, and 
a voluminous literature has been dedicated to it. In what follows, 
we shall examine several of the simplest results pertaining to this 
problem. 


55. APPROXIMATION IN THE MEAN 


We shall show, first of all, how Fourier coefficients make their 
appearance quite naturally in connection with the solution of a 
completely different problem. 

Let f(x) be bounded and integrable over [—7, a]. Wishing to 
find for this function an approximate expression in the form of a 
trigonometric polynomial of order n, 


I1,(x) = > + >> (a; cos kx + By, sin kx), 
= 


we inquire how we should select the coefficients a, and Bx so as to 
make this approximation as accurate as possible. Of course, this 
formulation of our problem is not yet well defined, since we have 
not yet stated how the error of approximation is to be measured. 
It is clear that we have a wide range of choice. For example, 
we could accept as the error of approximation the least upper 
bound of the difference 


U(x) — Un) 


for —7 < x < a. Another possible choice is to evaluate the error 
of approximation by means of the integral 


{ico = 1,09) ax. 


Finally, it is possible to evaluate this error by means of the integral 


i] ie [f(x) — U(x) 22 ax. (9) 


This last expression is, from a formal point of view, the most con- 
venient both in theoretical investigations and in practical computa- 
tion. It does not contain the absolute value sign, the presence 
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of which very frequently makes analytic operations more difficult. 
It is this last form of evaluating the error which we shall consider. 
The approximation corresponding to such a definition of the mag- 
nitude of error is usually called the approximation in the mean. 

We now have before us a well-defined problem: that of choos- 
ing the numbers ax and #; in such a way that the value of the in- 
tegral (9) is the smallest possible. It is evident that this integral is a 
function of 2n + 1 variables ag, a1, B1,..., Qn, Bn, so that we have 
to deal with a multidimensional minimum problem. However, the 
special character of this problem permits us to solve it without 
using the methods of the differential calculus. 

Let us write the integral (9) in the form 


J f?(x) dx +f" T1200) dx — 2f (x) Hn(x) dx. 


Denoting by a, and b, the Fourier coefficients of f(x), by virtue of 
the formulas (7) and (8) we have 





J seo Tn dx = 7 {282 4 3) (aan + Bib}. 0) 
“a k=1 
If we further take into consideration that 

ia cos? kx dx = fi sine kx dx = 1, 


the orthogonality of the system (5) at once gives us the expression 





fe (x) dx = nf at + ) (a,? + p.2)}. (11) 
kz=1 


Combining (10) and (11) and making use of the elementary relations 


an? — 2apaxn = (On — ax)? — ax’, 
By2 — 26,8, = (Bx — Ox)? — bx, 


we obtain 
t: 12x) dx — 2 f " f(co) Tnx) dx 


= (ose 5S ax — an)? + Be - bs)*I} 
kx=1 





2 n 
-n{ alu + > @?t+ bi2)}, 
2 kr=1 
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Consequently, 
ao 


[/U/@) — ThGoP ax = [" fx) ax - n{ aks > (a2 + b,2)| 





4 nf Gora 4S (ae = ay? + (Be - by). 
Z k=1 


On the right side, only the last term depends upon the num- 
bers a, and 6,; therefore, we have to find the minimum of this last 
component. But all its terms are nonnegative and, therefore, it will 
attain its minimum value (zero) when all these terms become equal 
to zero, that is, when 


9 = do, Oe =a, and B= bh (k = 1,2,...,7). 
Thus, the solution of our problem reads: 


THEOREM 3. Let f(x) be bounded and integrable on [—7, 7]. 
Then, from among all trigonometric polynomials of order n, the best 
approximation in the mean to f(x) is provided by the trigonometric 
polynomial whose coefficients are the corresponding Fourier co- 
efficients of this function. 


Denoting this trigonometric polynomial by P,(x), we have 


fUUs@) - Poop dx =f" P09 dx - {+ Seat + 5} 


Incidentally, this relation proves another very important theorem. 
Since the left side is obviously nonnegative, for any n we have 





2 n l or 
S— + Da? + bi?) SJ £20) dx; 
kr=1 Tt ST 





and since the right side of this inequality does not depend on n, the 
left side remains bounded as n increases. This gives us: 


THEOREM 4. The series 


- 


Ms 


F (a? + by?) 


Hi 


1 


is convergent for any bounded and integrable function f(x). In partic- 
ular, it follows from this that the Fourier coefficients a; and b, of 
every such function tend to zeroas k > o. 


These results, obtained by an altogether elementary argument, 
have very great significance for the theory of Fourier series. 
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56. COMPLETENESS OF THE SYSTEM OF 
TRIGONOMETRIC FUNCTIONS 


As we have seen in the theory of power series, one and the 
same series can serve as the Maclaurin series for infinitely many 
different functions. Is the analogous phenomenon possible for 
Fourier series? This question is, as we shall presently show, closely 
connected with a certain remarkable property of the system (5) of 
trigonometric functions, the property of completeness. 

If a series of the form (4) were the Fourier series of two differ- 
ent functions fi(x) and f(x), both continuous! on the interval 
[—7, 7], then the corresponding Fourier coefficients of these two 
functions would coincide. The difference 


f&) = fi) — pO) 


of these two functions, though not identically equal to zero on 
[—7, z], would have all its Fourier coefficients equal to zero. But, 
by formulas (7) and (8), this would be equivalent to the statement 
that f(x) is orthogonal to every function of the system (5). In this 
case, the orthogonal system (5) would not be, as we say, complete, 
that is, we could add to it a new function, not identically equal to 
zero, such that the enlarged system would remain orthogonal. And, 
clearly, the argument can be reversed: If the system (5) is not com- 
plete, then any function fo(x) which is orthogonal with respect to 
all the functions of the system (5) has all its Fourier coefficients 
equal to zero. But then all functions of the form 


LX) + afolx); 


where a is any real number, have one and the same Fourier series. 
We shall now prove the following. 


THEOREM 5. The system (5) is complete; in other words, any con- 
tinuous function orthogonal to all functions of (5) must be identically 
equal to zero. 


Proof. Let f(x) be such a function. We then have 
if f(x) T (x) dx = 0 


for any trigonometric polynomial 7 (x). Suppose that f(x) is not 


1If we omit the condition of continuity, then the answer to our question of uniqueness 
is trivially negative, since two functions differing from each other at only one point 
have the same set of Fourier coefficients. 
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identically equal to zero and, to be definite, suppose that f(x) > 0 
at x = a. Then,! as we know from the theory of continuous func- 
tions, we can find positive numbers c and 6 such that, for —7<a 
—~8<x<a+6< 7, we have the inequality 


f(x) >. 


Consider now the expression 


T,(x) = (tte =e)" 


Expanding by the binomial formula, we find for the function 7;,(x) 
an expression of the form 

n 

T,(x) = >\ ¢, cos? (x — a). 

r=0 
But, as we know from trigonometry, any power cos’ x can be ex- 
pressed as a linear combination of the functions 

1, cosx, cos2x, ..., cosrx 

with constant coefficients. (This can be proved very simply by in- 


duction.) Thus we obtain 


T,(x) = Sd, cos r(x — a). 


T=0 
Finally, taking into consideration that 
cos r(x — a) = cos ra cos rx + sin ra sin rx, 


we Shall arrive at the expression 
ao us : 
(63g a + >° (a, cos rx + B, sin rx), 
r=0 


where a, and £, are constant coefficients. Thus, 7;,(x) is a trigono- 
metric polynomial for every n, and, consequently, 


[fe Tlode SO “Gre ly 2h): (12) 


‘It is obvious that, under our conditions, we can take a to be an interior point of the 
interval [—1, 7]. 
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Let us now consider the behavior of T,,(x) for large values of n. 


Since 1 + cos (x — a) 


2 

clearly lies between 0 and | in the interval [—7, 7] and is equal to 
1 only at x = a, it follows that for large values of n, the quantity 
T,(x) is nonnegative, equal to | at x = a, and negligibly small for 
all values differing appreciably from a. In other words, this func- 
tion has a graph of the type shown in Figure 33 (p. 185), with the 
difference that the interval [— 1, 1] has to be replaced by the inter- 
val {—7, 7] and the maximum has to be transferred from the point 
x = Oto the point x = a. 

The outline of our further reasoning is now clear. Since, for 
large values of n. the function T,,(x) is negligibly small outside the 
interval [a — 6, a + 6], the corresponding parts of the integral (12) 
are also negligibly small. As a result, its sign is determined by the 
sign of the integral 


il f(x) T(x) dx. (13) 


Since in the integrand of (13) Tn(x) >0 and f(x) > ¢, it follows 
that the integral (12) cannot be equal to zero, we thus arrive at a 
contradiction. It remains for us, now, to demonstrate that the parts 
of the integral (12) which we have neglected are small in absolute 
value in comparison with (13). In order to carry out the computa- 
tion necessary for this purpose, let us set 


at+s 


; f(x) Tax) dx = 1; and *" f(x) T(x) dx +{ L(x) Tnx) ax = Le, 


an 


so that, from (12), we get 
Gg ‘i " fQ) Tr(x) dx = + be 


1 + cos (x — a) x— a 


= cos? , we have 





Noting that 
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But, forO <y << 6< ae we have 


O<sino <1 and 0<cos$ <1, 


from which 


II 
N 
fo 
3 
CQ. 
J 
Q 
(oe) 
” 
NX 
> 
II 
iS 


Or, if we set sin + 





_ 6 , 
I> 4c ("21 — 2)" dz = 4c {1 -(1 = sin 5) 
0 n+ 1 


Since for sufficiently large n the expression within the braces is 


greater than > we have 


2c 
n+1- (14) 


On the other hand, if we denote by M the maximum value of f(x) 
in [—7, 7], we note that in the domain of integration of the two in- 
tegrals which form [2 





i> 


cos2 ane < cos? S., 
and hence 
6 
T(x) < cos?” > 
We now obtain 
[J2| < M costs 3 (a —6+7+4+7-—a— 8) < 20M cost» 2 


= 20M pr, (15) 


where 
6 
— 2 
p = cos ae ae ie 
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Now to complete the proof. If /(x) were orthogonal to the sys- 
tem (5), then we would have 


[fo T(x) ax = I; + To = 0. 


Hence, |/;| = |/2| and, by (14) and (15), we would have, for 
n sufficiently large, 
2c cp 


Oo ] n+1 OE? 
eral r (n+ 1)p ery i 





27Mp" > 


But this gives us the required contradiction, for np" > 0 as n—> oo 
and, therefore, np” cannot remain greater than the constant posi- 





tive number as m increases.1 


vie 
We have thus proved the completeness of the orthogonal system 
(5). As we have already seen, it follows from this that a trigo- 
nometric series (4) can serve as the Fourier series for no more than 
one continuous function. In particular, if this series is uniformly 
convergent, its sum is the only continuous function having this 
series as its Fourier series. 


57. CONVERGENCE OF FOURIER SERIES FOR FUNCTIONS 
WITH A BOUNDED INTEGRABLE DERIVATIVE 


We shall now show that every function f(x) with a period of 27 
which has a bounded and integrable derivative can be expanded 
into a uniformly convergent trigonometric series (which is, there- 
fore, its Fourier series). 

Let us agree to designate the Fourier coefficients of f(x) by ax 
and b,, and those of its derivative f’(x) by a,’ and b,’. Integration 
by parts, then, gives 


TAK =i f(x) cos kx dx 








x) sin kx|” 7 : ab; 
a SUA ) E | mae’ f fs) sin kx dx = — re ; 
Setting n In i = x we have np” = xe-t — 0 as x > o, aS we have seen in the 
e In— 


p 
footnote on page 186. 


201 


and similarly, b, = a In what follows, it will be convenient to 


apply the so-called Cauchy-Schwartz inequality:? 


which is valid for any ‘real numbers uw; and v, and for any n. 
Noting that 


a, cos kx + by sin kx = a (=i cos kx + a,’ sinkx), 


and utilizing the Cauchy-Schwartz inequality and the inequality 
(a cos p + B sin ~)? = a? + B2 — (asin pg — Bos g)* < a? + P?, 
we find that for m > n 





m 





>; s(- by’ cos kx + ax’ sin kx) 

k=n 
<> se S (—bx' cos kx + ax’ sin kx)? 
k=n kin 

< > me y (ay”” + by”). 
k=n ke kan 


; ei Teg = 
But the series = 7's convergent, and so is the series Py 


(ay’” + bx’), since a,’ and b,’ are the Fourier coefficients of the 
bounded integrable function f’(x) (see p. 196). Thus, on the right 
side of the last inequality both factors tend to zero for any m > n 
as n — co (by the Cauchy condition). This means that 


lim > (a; cos kx + by sin kx) = 0, 


ee ka? 
m>n b 


and that, moreover, the convergence is uniform with respect to x, 
since the right side of the preceding inequality does not depend on 
x. It follows by the Cauchy condition that the series (4) is 
uniformly convergent in the interval [—7, a]. Denoting its sum by 


a 
LN (uax + ve)? 1S, as a function of x, a nonnegative quadratic trinomial; hence, 
kr=1 


its discriminant 


nr 2 n a 
of[ Su} - Sue $ va) co 


-k=1 Axl kr=1 
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s(x), we see that (4) is the Fourier series of both functions f(x) and 
s(x). But then these functions, being continuous, must coincide, and 
our theorem is established. 

The modern theory of trigonometric series proves the conver- 
gence of Fourier series for significantly more extensive classes of 
functions than the class which we have considered. This extension, 
however, cannot actually be carried very far. We know that, even 
among the continuous functions, there exist some whose Fourier 
series do not converge for some values of x. Here, of course, 
we shall not be able to discuss these problems in greater detail. Let 
us simply note that the theory of trigonometric series, to which an 
extensive literature, including many textbooks, is devoted, includes 
to this day a great many important problems for which solutions 
have not yet been found. Hence, the field naturally attracts the 
efforts of many investigators. 


58. EXTENSION TO ARBITRARY INTERVALS 


So far we have been considering functions defined on [—z7, 7], 
and only on this interval did we seek the expansion of the 
given function in a trigonometric series. Moreover, we tacitly as- 
sumed that f(7) = f(—7), since only under this condition can 
a function f(x) be represented by a series of the form (4) at 
all points of the closed interval [—7, 7]. Now we shall see how we 
can free ourselves from these restrictions, for it is clear that 
the theory of trigonometric series can gain wide application only if 
we can expand functions defined on arbitrary intervals, not 
restricted by any requirements of periodicity. 

First of all, it is obvious that (as we pointed out at the very be- 
ginning) nothing of what has been said will be changed if, instead 
of the interval [— 7, 7], we take as the starting point of our discus- 
sion any interval (a, a + 27] of length 27, providing f(a + 27) = 

f(a). Thus, in our exposition we set requirements only on the length 
of the interval under consideration, not on its position. And 
we required of the function f(x) that its values at the end points of 
the given interval be equal. 

Suppose now that we wish to expand in a trigonometric series 
a function f(x) defined on a completely arbitrary interval [a, 5] and 
not subject to any conditions of periodicity. As before, we shall 
assume only that f(x) is differentiable at each point of [a, 5] 
and that its derivative is bounded and integrable in this interval. 


203 


Let us assume first that b — a < 27. Obviously, it is possible to 
define in infinitely many ways a function f*(x) on the interval 
[a, a + 27] having in this interval a bounded and integrable deriv- 
ative, satisfying the condition f*(@ + 27) = f*(a), and coinciding 
with f(x) at all points of [a, b]. By what we have already proved, 
Jf*(x) can be expanded in the interval [a, a + 27] into a uniformly 
convergent trigonometric series. It is obvious that in the interval 
[a, b] this series represents the function f(x) and thus completely 
solves our problem. 

Now let 6 — a be greater than 27. In this case, we shall take, 
first of all, any number 5’ > b and define on the interval [a, b’] a 
function f*(x) in such a way that it will have in this interval 
a bounded and integrable derivative, it will satisfy the condition 
f*(') = f*(@), and it will coincide with f(x) at all points of [a, 5]. 
If, in addition, we construct f*(x) so that f*’(b’) = f*’(a) (which, of 
course, is always possible) and extend /*(x) as a periodic function be- 
yond the limits of the interval [a, b’] in both directions, we shall 
obtain a periodic function f*(x) having the period 6’ — a = 2/ > 
2m. This function has a bounded and integrable derivative in any 
interval and coincides with the function f(x) in the interval [a, 5). 

Let us now set 


x= ty and f*(x) =s*(4y) = 9(y). 


Clearly, y increases from —7 to 7 as x increases from —/ to /, and 
the function g(y) has a bounded and integrable derivative in the 
interval [—7, a], while g(—7) = f*(—/) = f*(/) = o(n). By our 
theory, the Fourier series of p(y) converges uniformly to this func- 
tion in the interval [—z7, a]: 


oy) = + (ay cos ny + by, sin ny) (-7<y<7). 


n=l 


Substituting y = +x, we obtain 
fo= > +> (a, cos aes + b, sin x), (16) 
n=1 


this series converging uniformly in the interval —/ < x < /. Since 


the functions cos aimee and sin aa like the function f(x), 


have the period 2/, the expansion (16) is valid uniformly on the 
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whole real line. In particular, in the interval [a, b] we shall 
have uniformly 
fo) =~ + > An cos “Fx + b, n sin x, 
We see then that f(x) is expressed in the interval [a, b] by 
a trigonometric series differing from the series (4) only in that the ele- 
ments of the expansion, instead of being the functions cos nx and 


; ‘ no : . ; 
sin nx, are the functions cos 7% and sin x whose period is 2/ 


instead of 27. This result solves our problem in the best possible 
way, since in view of the inequality b — a> 2n, the function f(x) 
in the interval [a, b] cannot, in general, be expressed by a series of 
the form (4) whose terms have the period 27. 

We have still to see how it is possible to obtain from the func- 
tion f(x) the coefficients a, and b, of the series representing it. For 
this purpose, we note that by our initial definition 


eer eee 


Substituting y = 7% we find 


1 ¢? T 7 
+f, (=x) cos kx dx 


ug? x Pi i tO 2 
=--{  S*@) cos k 7x ax = J, £*@) cos k 7x ax 


and, similarly, 
_ 1 (% pe . 7 
bi =—|[s (x) sin k 2x dx. 


Since there exist infinitely many functions f*(x) satisfying the 
required conditions, the coefficients a, and b; are not determined 
uniquely by the values of f(x) on [a, 5]. But this should not surprise 
us: the series (16) is called upon to represent f(x) only in the interval 

k 


[a, b] whose length is less than the period 2/ of the elements cos a 


and sin a Similarly, for any proper subinterval of [—7, 7], there 


are infinitely many trigonometric series (4) representing f(x) in this 
subinterval. 
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8. Differential Equations 


59. FUNDAMENTAL CONCEPTS 


At the end of Lecture 6, we mentioned that the purpose of 
every integration process is to obtain from the given /ocal charac- 
teristics of the phenomenon under consideration its global charac- 
teristics, that is, the description of the phenomenon as a whole. 
Problems of this type are extremely numerous in all applied 
sciences which employ the resources of mathematical analysis. 
However, the extremely varied requirements set for the mathemat- 
ical apparatus by problems of this kind cannot always be satisfied 
with the help of a tool so elementary, from the conceptual point of 
view, as either the ordinary or the multiple integral. Only in a rela- 
tively small number of the most primitive cases does the simple 
apparatus of integration turn out to be adequate for the solution of 
the given problem. 

More often the situation is such that the known local character- 
istics of the phenomenon under consideration lead to a system of 
differential equations in which the unknowns are the very functions 
which describe the global characteristics of this phenomenon. The 
solution of a given concrete problem is thus reduced to the solu- 
tion of a system of differential equations, that is, to the determina- 
tion of the unknown functions appearing in these equations. This 
mathematical problem is much more complicated than the ordi- 
nary integration of functions. In terms of the theory of differential 
equations, ordinary integration is only the simplest particular case 
and, moreover, a trivial one. That is, whenever the solution of any 
type of differential equations can, by one means or another, be re- 
duced to the ordinary integration of functions (or, in the language 
of the theory of differential equations, to quadratures), the general 
problem is thereupon considered solved. 

It will be useful to observe by means of a very simple example 
how the knowledge of the local characteristics of a phenomenon 
can give rise to a differential equation whose solution permits us 
later to describe this phenomenon as a whole. 
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Let us imagine a container whose capacity is a liters, filled with 
a salt solution. Let there be a continuous flow into and out of the 
container such that in one unit of time the flow adds to it 5 liters 
of pure water and drains from it the same amount of solution. Let 
us assume, further, that at a certain initial moment ¢ = 0 the con- 
tainer held c kg. of salt. We want to know the number x of 
kilograms of salt our container will retain after ¢ units of time. (We 
assume, in addition, that the mixing of the solution is so rapid that 
we can regard the salt concentration as being at every moment the 
same in all parts of the container.) 

What local characteristics of the phenomenon are given in the 
above problem? We know that the solution flows out of the con- 
tainer at the rate of b liters per unit of time. At a given moment / 
the container holds an unknown quantity x kg. of salt and, since 
the capacity of the container is a liters, every liter of the solution 


contains ~~ kg. of salt. Hence, 5 liters contains bx = sx kg. of salt, 
a a 


where we shall write s for 2. This means that if, during a unit of 


time starting from the moment 1, the concentration of the solution 
remained constant, the amount of salt in the container would de- 
crease during this unit of time by sx kg. This, therefore, is the 
magnitude of the rate of decrease of the amount of salt in the con- 
tainer at the moment /. We thus have 


_ = —SX. (1) 
This formula, expressing the instantaneous rate of change in 
the amount of salt in terms of the amount of salt present at a given 
ax 
at 
characterization of the phenomenon. (Here it would, of course, be 
more convenient to say a momentary or instantaneous characteriza- 
tion.) The function x = x(¢), which we seek, is the unknown in the 
equation. 

Can this function be found directly by the methods of integral 
calculus? What is given is a formula expressing its derivative; and 
finding a function when its derivative is given is precisely the 
fundamental problem of the integral calculus. Nevertheless, this 
problem is not altogether ordinary: the derivative of the required 


moment (note is negative since x 1s decreasing), gives us a local 
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function is expressed not, as we are accustomed, in terms of the in- 
dependent variables, but in terms of the unknown function itself. 
The integral calculus does not directly undertake the solution 
of problems of this kind, and, therefore, we must formally deal 
with a basically new problem: the solution of the differential equa- 
tion (1). In the present case, of course, the problem reduces in 
a trivial way to a problem in integral calculus. Writing (1) in the 
form 


—> = —s dt, 


we, in the customary terminology, separate the variables. A simple 
integration gives In x on the left side and —st on the right, so that 


Inx = —st +k, 


where k is a constant. To determine this number, we turn (as 
is characteristically done when solving differential equations) to the 
initial condition: at t = 0 we have x = c. This gives us k = Inc, so 
that, finally, 


x = ce, 


The given problem is thus completely solved. We see that as 
the time increases the amount of salt in the container decreases 
exponentially. 

Let us now imagine that the liquid flowing out of our container 
flows into yet another container of the same capacity, initially 
(that is, at the moment ¢ = Q) filled with pure water; and that here 
also the liquid flows in and out at the rate of 6 liters per unit of 
time. It is obvious that salt is then constantly being introduced 
into and removed from this second container, and it is desired to 
know the number y of kilograms of salt which will be present in 
the second container at any given time /. 

Here again, we are given only a local (instantaneous) character- 
ization of the phenomenon. It is clear that in one unit of time 
as much salt flows into the second container as flows out of 
the first, namely, 


sx = sce7*!, 
On the other hand, at the time /, every liter of the liquid in the sec- 


ond container holds 7 kg. of salt, so that the 5 liters which flow out 
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of it in one unit of time contain sy kg. of salt. Consequently, 
the total increase of the quantity of salt in the second container in 
one unit of time at the moment 1 is 


dy 
—— = sce~st — sy = s(ce*t — y). 
eT SP em at y) (2) 

You see that here again the local description of the phenomenon 
has found its mathematical expression in a differential equation. 


Again we have an expression for the derivative Y of the unknown 
[ 


function. But this time the solution of the equation is not so easily 
found as previously: Equation (2) gives us an expression for the 
derivative se which contains the independent variable ¢ as well as 
the unknown function }, and we cannot immediately separate the 
variables here as we could in equation (1). Thus, we are faced with 
an essentially new problem, which in the given case can be solved, to 
be sure, in a relatively simple way, but whose solution in the gen- 
eral case we have no methods of finding. 

What is the nature of this general case? How do we define the 
general concept of a differential equation? In our examples, we have 
been dealing with equations of the form 


dy _ 
ie = LD) 3) 


where x is an independent variable and y an unknown function of 
the variable x; the function f(x, y) is, of course, given. The problem 
consists in finding a function y = (x) satisfying equation (3), that 
is, a function such that on an intervala < x < b we have identically 


g(x) = fix, e@))- 


A somewhat more general type of differential equation is rep- 
resented by the relation 


FS Voy) SM 
where again x is an independent variable, y is an unknown func- 
tion of the variable x, and y’ = 2. Just as before, we are looking 
for a function y = g(x) identically satisfying the relation 


F[x, p(x), #'Q)] = 9. 
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In addition, there exist cases in which the local characteristics 
of the phenomenon under consideration require for their expres- 
sion not only the first derivative of the unknown function but also 
derivatives of higher orders, so that the differential equation takes 
the form 

POI yang) =O. (4) 


Such an equation is called a differential equation of order n. It is 
the most general type of differential equation for problems which 
deal with only one unknown function and one independent vari- 
able. Such a situation occurs, however, only in the simplest cases, 
we frequently need to find several functions which depend upon 
several independent variables. 

Let us turn first to the consideration of an arbitrary number of 
unknown functions y1, yo,..., x, dependent, however, only on one 
independent variable x. For the problem to be determinate, the 
local description must lead us to a system of differential equations 
equal in number to the number k of unknown functions. The gen- 
eral form of such a system of equations is 


Fi(x,y1,1; aa » yim, Yo, yo’, eee » voir), te Veo Vrs ne Yam) =0 


where i = 1,2,...,. Thus, we obtain the most general problem in 
the theory of so-called ordinary differential equations, the term used 
to denote differential equations containing only one independent 
variable. 

A good example is to be found in the system of equations (with 
which you are undoubtedly familiar) describing the movement of a 
point mass: 








Sa ees 
datz Si Saar ae eae 
ay ( ax dy dz 

= it i ne i ae Sr | 

dt? RA ae ae =) 

d2z dx a dz 

ors wre , xX, y, eee = 

are (oy. 2S 


where the only independent variable is the time s and the unknown 
functions are the coordinates x, y, and z of the point. Here m de- 
notes the mass of this point and X, Y, and Z are the components 
of the resultant force arising from all the forces acting on it. 
dependent in the general case on time and the position and 
velocity of the moving point. The initia/ values consist of the three 
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coordinates and three component velocities of this point at a cer- 
tain specified initial moment f = fo. 

If there are several independent variables, then we are dealing 
with an equation (or a system of equations) with partial derivatives. 
The unknown functions are now functions of several variables and 
the equations naturally contain their partial derivatives with 
respect to these variables. The theory of partial differential equa- 
tions is, of course, much more complicated than the theory of or- 
dinary differential equations, and we shall not deal with it at all. 

But even the theory of ordinary differential equations does not 
abound in general methods sufficiently powerful to enable us to 
find solutions for wide classes of such equations. To some degree, 
this can be foreseen from the fact that even ordinary integration 
applied to elementary functions leads, in many cases, to non- 
elementary functions. All the more is this to be expected with more 
general and more complicated problems. Even if we take the point 
of view, customary in the theory of differential equations, that hav- 
ing reduced the solution of a problem to ordinary integrations, we 
may consider the problem solved, we shall still have progressed 
little, since such a reduction to quadratures can be performed only 
for a small number of the simplest (but, from the purely practical 
point of view, the most important) types of differential equations. 
We shall not discuss here nor even enumerate these types, as you 
will find all the material you could wish on the question in any ele- 
mentary textbook. We shall be concerned, rather, with more basic 
questions. 


60. THE EXISTENCE OF A SOLUTION 


We have already mentioned that the fundamental problem of 
the integral calculus may be considered as the simplest particular 
case of the solution of a differential equation. Namely, if in the 


equation dy 
pel ae dee 3 
ee (3) 
the right side does not depend upon ), then we obtain an equation 
of the form 
# = fe, 
dx (3’) 


whose solution is obviously equivalent to the integration of the 
function f(x). In connection with this problem, we had occasion to 


211 


observe that even when the function f(x) is continuous, we have to 
prove the existence of the integral, that is, the existence of a solu- 
tion of the equation (3’). This was done by a special argument, 
which was based on the uniform continuity of f(x). If the function 
f(x) is bounded but discontinuous, then, generally speaking, the in- 
tegral does not exist. 

From all this it is evident that the problem of the existence of 
a solution to equation (3) is very likely to be highly compli- 
cated and certainly requires special investigation. And this applies, 
in an even greater degree, to those more general types of equations 
and systems of equations which we cited earlier. In order to throw 
into bold relief the basic features of this fundamental problem of 
the theory of differential equations, we must, as far as possible, free 
our exposition from the burden of purely technical complications. 
We shall therefore limit ourselves in what follows to the considera- 
tion of equations of type (3), that is, equations of the first order 
solved with respect to the derivative of the unknown function. 


THEOREM 1. Suppose that f(x, y) is continuous in a region D of 
the (x, y) plane. We shall show that for every point (xo, yo) within this 
region, there exists a function y = (x), satisfying equation (3) ina 
neighborhood of Xo, such that yo = (Xo). 


It is clearly sufficient to prove our theorem for the case where 
the region D is bounded and closed, and the point (xo, yo) is an in- 
terior point of D. To carry out the proof, we shall need a lemma 
which is also of independent interest. 

Suppose we are given an infinite family of functions S = {F(x)}, 
defined on an interval [a, b]. The family of functions S will be said 
to be bounded on [a, b] if there exists a number M such that | F(x)| 
<M for all x € [a, b] and for all F(x) in S. 


DEFINITION. We shall call the family S equicontinuous on [a, b] 
if, for any positive € there exists a & > 0 such that. for any function 
F(x) in S and any pair of points x1 and xz in {a, b] satisfving the in- 
equality |x1 — X2| < 6, we have 


|F (x1) — F(x2)| <e. 


It is obvious that if the family S is bounded (or equicontinuous) 
on [a, 5], then each of the functions belonging to it is bounded (or 
uniformly continuous) on this interval. The converse is, in general, 
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not true; boundedness and equicontinuity of the family S require, 
besides the presence of the corresponding property in each func- 
tion, the further condition that this property be uniform for 
the totality of functions in the given family. 

Let us now agree to define the spread of a family S of functions 
in the interval (a, b] as the least upper bound of all quantities 
|Fi(x) — Fe(x)|, where x is any point in [a, b], and F, and Fo are 
any two functions in S. 


Lemma. Every infinite family S of functions which is bounded and 
equicontinuous on an interval I contains a sequence of functions uni- 
formly convergent in this interval. 


We shall carry out the proof in several consecutive stages. 


Proof. (i) We shall prove first that for any e > 0 there exists a 
6 > 0 such that, for any subinterval J contained in / and of length 
less than 6, the family S contains an infinite subfamily S’, whose 
spread in J is less than e. 

Let M be a number such that | F(x)| < M for all Fin S and all 
x in J. Because of the equicontinuity of the family S on the inter- 
val J, we can find, for any positive integer n, a positive 6 such that 
the oscillation of any function belonging to S on any subinterval 


of length less than 6 will be smaller than a Let J be a fixed sub- 


interval of length less than 6, and consider any given function 
F(x) € S. Let € and y denote the minimum and maximum values 


assumed by F(x) in J, so that y — €< “. Since € and 7 are both 
interior to the interval [—M, M], there exists an integer k, 
—(n — 1) <k <n -— 1, such that 


k—] k+1 
n 1 


|, ee | ; M. 








Thus, we see that the values assumed by any function of the family 
S on the interval J are included in an interval of the form 








| ~—* wv, =*- ], where —(n—1)<k<n-—l. 
fn n 


Since there is a finite number of these latter intervals, while 
the family S contains an infinite number of functions, at least one 
of these intervals will contain all the values assumed by some in- 
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finite subfamily S’ of the family S. It is obvious that the spread of 








the family S’ in the interval J does not exceed 2M and will be less 
2M 
than e if we select the number x so that ; eo: 


(ii) We shall now show that the family S also contains an infi- 
nite subfamily S’ whose spread in the entire interval J is less than 
e. For this purpose, let us subdivide the original interval / into sub- 
intervals J;, J2,..., 4m, each of length less than 6. Then, by virtue 
of what we have just established in (i), the family S contains an in- 
finite subfamily S$; whose spread is less than e in /;. For the same 
reason, the family S$; contains an infinite subfamily S2 whose 
spread is less than e in Jz, and so on. Finally, we come to an infi- 
nite subfamily S,, = S’ of S, whose spread is less than e in 
the interval J, as well as in each of the preceding intervals, since 


S = Sn © Suni G2+C, SoC Si: 


But this means that the spread of the family S” is less than e in the 
whole interval /, and our assertion is proved. 

(iii) The proof of our lemma can now be completed quite 
simply. Let us denote by S; an infinite subfamily of S whose 
spread in J is less than 1. Similarly, let us denote by S» an infinite 


subfamily of S; whose spread in / is less than Sand, in general, 


let us denote by S, an infinite subfamily of S;_; whose spread 


is less than -. (All these subfamilies exist by virtue of what 


we have just proved in (ii).) Let Fy(x) be any function belonging to 
the family S,; F2(x) any function belonging to the family So, differ- 
ent from F;(x); and, in general, let F,(x) be any function belonging 
to the family S,, and different from Fr_i(X). Fn_2(a), . .., F1(x). Since 
Snip C Sy, the functions F, and F;,,, are both contained in S, and, 
consequently, for every point x in J we have 


: : l 
| Fn(x) — Frap(x)t < . Cp = 07 V2 )y 
It follows, by the Cauchy condition, that the sequence of functions 
F(x), Fo(x),..., Fn(x) converges uniformly in J, and the proof of 


the lemma is complete. 
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Let us now remark, for later use, that our lemma is by no 
means restricted to functions of one variable. Suppose, for exam- 
ple, that S is a family of functions bounded and equicontinuous on 
the rectangle J. Then, for each e > 0, there exists a 6 > 0 such that 
the oscillation of any function F(Q) in S is less than e on any rec- 
tangle J contained in J whose diagonal is of length less than 6. The 
proof of the lemma may now be applied verbatim, and we conclude 
that there exists a sequence F,(Q), F2(Q),..., of functions in S 
converging uniformly in /. 

Since the proof of the fundamental theorem, formulated earlier, 
will be based on an idea which is easiest to grasp in geometrical 
terms, it will be very useful before beginning the proof to see what 
form the problem (the finding of a solution for the given equation 
(3)) assumes geometrically. 

Graphical representations of the solutions y = o(x) of equation 
(3) are usually called integral curves. The equation (3) asso- 
ciates with every point (x, y) of the region D a direction given by 
the slope 2 = /(x, y). The aggregate of points of D, together with 
their corresponding directions, form the so-called field of linear ele- 
ments of the equation (3). An integral curve of this equation 
is a curve such that the direction of its tangent at each point coin- 
cides with the direction of the field at that point. Thus, the prob- 
lem of finding a solution y = g(x) of equation (3) such that y = yo 
for x = Xo becomes, in geometrical terms, the problem of finding 
an integral curve of the given equation which passes through the 
point (Xo, yo). To prove the existence of this solution means, there- 
fore, to prove that through any given point there passes at least one 
integral curve. 

The method that we are about to use for this purpose will con- 
sist of constructing auxiliary curves which approximate more and 
more closely the integral curve. This curve is then obtained by 
passing to the limit. 


Proof of Theorem I. Let (xo, yo) be an interior point of a closed 
and bounded region D in which the function /(x, y) is continuous, 
and let M denote an upper bound of |/(x, y)| in this region. If the 
positive number a is sufficiently small, then the rectangle R, 
Xo -—aA< xX < xX + aand yo — Ma <y < yo + Ma, will be con- 
tained within D. We shall show that there exists a function 
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y = (x) satisfying equation (3) for all x in [xo — 4, xo + a] and 
such that (xo) = yo. To make the proof clearer, let us again divide 
it into separate steps. 


G) Let x, = xo + ae (0 <k <n), so that the points 
n 
X1, X2,...,Xn-1 divide the interval [xo, xo + a] into n equal parts. 


We now construct over the interval [xo, x9 + a] a broken line y = 
@n(x) (Fig. 34) with vertices having abscissas Xo, X1,..., Xn, in such 





c 
o|----------- 





Xy X2 Xz ¥ Xk-1 
Fig. 34 


a way that the slope of each segment coincides with the direction 
of the field at the left end point of that segment. Clearly, this can 
be done in successive steps passing from x,_1 to x,. The ordinate 
Yk = Pn(Xx) will be given by the recursive formula 


Ve = Year + (Xk — Xp-1) fXn-1, Ve) 
= yr + “Sf (Xe yea) (<k<n), (5) 


and the equation of the broken line over each interval x,_1 <x < x; 
will be given by the formula 


P(X) = Vea + OX — Xe) f (Xn Ye) (ILS A <n). 


To prove that the whole broken line y = ,(x) from x = xo to 
X = Xo + @ is contained in D, it would clearly be sufficient to 
prove that 


vk — Yo] < Ma Ce<- kh <n); 


We shall find it useful, however, to prove the stronger inequality 
bn yol SMa (1k <n). 
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This inequality obviously holds for & = 0. If the inequality is true 
for a particular k < n, then the corresponding point (xx, )’x) is con- 
tained in D, and, therefore, {f(xx, }%)| < M. But in this case the 
recursive formula (5) gives 


Ye+1 — Vol < le — Yo| + ~ [fre re) S Att | Ma, (6) 


so that our inequality is also true for the number & +4 1. It follows 
that 
IPn(x) — Yo| < Ma (x9 < x < Xo + a), (7) 


and the entire broken line » = ¢,(x) lies in D. Since this is true for 
every n, it follows that the family of functions ¢,(x) (n = 1,2,...) 
is bounded on the interval [xo, xo + a]. 

(ii) Let x’ and x” be any two points in [xo, xo + a], and, to be 
definite, let 


NRG RN OR Ke 
Then 
P(X") — Pal’) = [Prl(X”) — Pnr1-1)] 
+ [@n(X1-1) — Pn(X1-2)] 
+++ + [Pn(xK) — PnlX’)] 
= (x — x11) f(x1-1, Yi-1) 
+ eal (X12, Yi-2) He + fn Ve) 


+ (xK — x’) f(Xr-1 Ve-1): (8) 


We are now in a position to show that the family of functions 
@rAx) (n = 1,2,...) is equicontinuous on [xo, xo + a]. The rela- 
tion (8) gives 


|pn(x””) — @r(x’)| << MIX” — ma + (— & - DE + Xk — X'] 


= Mix” — x1 + (X11 — Xk) + XK — x’] 

= M(x” — x’), (9) 
from which we see directly that, for sufficiently small |x” — x’|, we 
have |@,(x”) — @n(x’)| < ¢ for arbitrarily situated points x” and x’ 
and for any n. 


(iii) Since the family {pn(x)} is bounded and equicontinuous on 
[xo, Xo + a], it follows from our lemma that it must contain a 
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sequence of functions converging uniformly in this interval. Since 
in what follows we shall have to deal with this sequence only, we 
can denote it by (x), po(x),..., Pn(x),.... Thus, as n > oo the 
functions y,(x) tend to a limit p(x), uniformly for x9 < x < xo 4+ a. 
We shall now show that the function q(x), thus defined, satisfies all 
the conditions of our theorem. Since ,(xo) = yo for every 4, it is 
immediately apparent that y(xo) = Jo. 

(iv) We shall now verify that, for x9 < x < x9 + a, the function 
g(x) satisfies the differential equation (3). Let e be any fixed posi- 
tive number. Since the function f(x, y) is continuous and the region 
D is closed and bounded, there exists a 6 >0O such that, for 
any two points (€;, 71) and (£9, n2) in D related by the inequalities 


2 —&| <6 and |ne2 — m| <2M6, 
the inequality 
If(é2, ne) — f(&, m)| <e 


is also satisfied. 

Let x’ < x; < x”, where x; is any of the points of subdivision 
of the interval [xo, xo + a] corresponding to the function 9,(x). 
Setting, as before, p,(x;) = yi, we have from (9) that 


Lyi — PrlX’)| < MG — x’) << MQ” — x’); 
and since, for sufficiently large n, we have 
| Pr(x’) — 9(x’)| < M(x” — x’), 
we also have 
Lyi — 9O’)| < 2M(x”" — 2’). 


Therefore, if |x” — x’| <4, then for sufficiently large » and for 
x’ <x < x” we have 


|xi-—x’| <6 and |y,—y¥’| << 2M6, 


where yj; = @n(xi) and y’ = g(x’). From this, in accordance with 
the definition of the number 6, we obtain for x’ < x; < x” and for 
sufficiently large n 


PO vi) — fC YD <e, 


or 


IO LY) -—eE<Sfany) <fery) +e 
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If we apply this estimate to every term! on the right side of (8), 
we see that for 0 << x” — x’ <6 and for sufficiently large n we 
have the inequalities 


POY) = 1 =X) < Onl) = Gr’) 
Sf a. 
Since both the left and the right side of these inequalities are inde- 


pendent of n, we find by passing to the limit that, under the sole 
condition |x’ — x”| < 6, we have 


[SO — la” — X) < OX”) — 9X) 
<1 OY) + E(x” — x’); 
or, what is the same thing, 
PO) — es) PO) =I KG) Soe. 
But this means that 


p(x) = f(x, ¥) (xo < x < Xo + a), 


where, for x = xo, p’(x) denotes the right-hand derivative. 

(v) Finally, since we can clearly obtain the same result for the 
interval [x9 — @, Xo] in a completely analogous manner, the func- 
tion g(x) satisfies all the conditions of our theorem, which is there- 
fore proved. 


The functions y = ¢,(x) which we have constructed have as 
their graphs broken lines. The direction of each segment of such a 
line coincides with the direction of the field at the left end point of 
this segment. The lengths of the segments decrease indefinitely as 
n increases. Thus the larger the value of n, the denser is the set of 
points at which the direction of the broken line coincides with the 
direction of the field. It is therefore quite natural to expect that if 
these broken lines have a limiting curve (as n > ), its direction 
will coincide at every one of its points with the direction of the 
field; that is, it will be an integral curve. The existence of a limit- 
ing function (if not for the whole sequence of functions ,(x), then 
at least for some subsequence, which is sufficient for our purpose) 
has been established in our lemma. Thus, we had only to confirm 
the correctness of our expectation by a rigorous and precise 
argument. 


1That {(x%-1, Yk-1) in (8) also lies between f(x’, y’) — e and f(Q’, ) + € follows by the 
same argument, since for sufficiently large n we have |x’ — xz1| <6 
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61. UNIQUENESS OF THE SOLUTION 


The importance of Theorem 1 in section 60, guaranteeing a solu- 
tion for the equation (3) under certain conditions, is self-evident. 
When in practice we find it necessary to solve a differential equa- 
tion, we make efforts, sometimes strenuous efforts, to reduce 
its solution to quadratures (ordinary integrations). But if this fails, 
we usually apply one or another device of approximate computa- 
tion. For our activity to make sense, we need the assurance that the 
object we are seeking actually exists; all our efforts would be 
wasted if our equation had no solutions at all. 

A no less important question is that of the uniqueness of 
the solution. For, even if the problem under consideration leads to 
the conclusion that the function g(x) must satisfy equation (3) and 
take the value yo at x = xo, we still cannot consider our original 
problem as solved unless we are assured that there exists no other 
solution of equation (3) satisfying the same initial condition 
g(xo) = yo. For, if there were several such solutions, we could not 
be sure that the particular one we have found is the one which 
solves our original problem. Even if we had found all the solutions 
of equation (3), still, generally speaking, we would have no way of 
knowing which one of them corresponds to our original problem. 

It is noteworthy that the condition which permitted us to prove 
the existence of a solution (namely, the continuity of the function 
(x, y) in the given region) cannot assure the uniqueness of this so- 
lution. There are cases in which the function f(x, y) is continuous 
in the region D and where, nevertheless, there exist several solu- 
tions of equation (3) which take the value yo at x = Xo. It is possi- 
ble, however, to obtain uniqueness of the solution if we require of 
the function f(x, y) something more than simple continuity. One of 
the most useful forms of such a strengthened condition. called a 
Lipschitz Condition, states that: there exists a constant k such that 


UO vr) — £0 v2) SAL — ye | (A) 


for any two points (x, y1) and (x, 2) in the region D. To be sure, this 
requirement is not the weakest that might be formulated. But it is 
fulfilled in the majority of cases encountered in practice, and 
its simplicity makes it very convenient in application. So, we prove: 


THEOREM 2. If f(x, y) is continuous in D and satisfies condition 
(A), then the solution y = (x) of equation (3) which takes the value 
yo for x = Xo is unique. 
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Proof. Suppose that each of the two functions Pi(x) and po(x) 
satisfies equation (3) for x» -a<x< Xo + @ and that 9)(xo) = 
P2(Xo) = Yo. Let y2(x) — gi(x) = w(x), So that (xy) = 0. Then, for 
Xo —@< xX < No + a, we have 


a dq, 
Fe ~ de | = [FO 2D = fe. ex) | 


S kl a(x) — i(s)| = kl o)]. (10) 


We shall denote by » the maximum value of | o(x)| in the smaller 








ds | 
ax | 








: l ] 
of the two intervals | xo Sie? Xo + 5 | and [xo — a, xo + al. 


Let us denote this interval by [xo — r, xo + r], where r denotes the 


lesser of the numbers a and a Because of the continuity of 


|w(x) |, its maximum value on this interval is attained at some 
definite point x = x. that is, |w(x1)| = p. Applying relation (10) 
and the first mean value theorem, we find! 
r 
f ee. dx | 
To dx 


= |w(rr)| = |wOr1) — o(xo)| = 
ie ae <k If 1eeo ae 
I Ee 


< kplx1 — xo| < kKuap = 





< dx 














dx 


whence » = 0. This means that w(x) = 0 forxy —-r<x<xo¢r. 


If r= a then the theorem is proved. If, however, r = % <a, then, 


at any rate, 


o(ue— ze) ee(w +k) =e 


and we can then repeat the reasoning taking in succession as initial 


points, instead of x = xo, the points x9 — a and xo + aa This 


will permit us to assert that w(x) = 0 in the lesser of the two inter- 


vals | xo —2 ora Xo + 2 =| and [xo — a, Xo + a]. Repeating this 
process a sufficient number of times, we shall clearly extend the in- 
terval in which we know w(x) = 0 until it coincides with [xo — a, 


Xo + a], thus proving the theorem. 


1 We take the absolute value of the integral in view of the possibility that x1 < Xo. 
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62. DEPENDENCE OF THE SOLUTION ON PARAMETERS 


When a differential equation arises out of some concrete prob- 
lem, this equation usually contains several parameters, represent- 
ing the values of the constants which determine the specific condi- 
tions of the phenomenon under consideration. Thus, in the problem 
which we used as an example at the beginning of this lecture, the 
container capacity a, the velocity b of fluid flow, and the initial 
amount c of salt in the first container are such quantities. All three 
of these quantities enter naturally into the equation which we con- 
structed, and, of course, every solution of this equation will depend 
upon all of these parameters. Thus, if we wish to stress this 
dependence, we have to write equation (3) in the form 

ye _ 
a = 


and its solution in the form 
Y = GX, P1, Pas «+ +s Pr)s 


where pi, P2,...,pr are the parameters of the problem. 

It is not hard to see that in applied mathematics the nature 
of the dependence of the solutions of the differential equation on 
the parameters is of essential importance, and that the establish- 
ment of the continuity of this dependence is particularly important. 
This is especially so, since the values of the parameters, as well as 
the values of x, are usually obtained as the result of one or another 
kind of physical measurement, and are usually not given with ab- 
solute accuracy, but only as an approximation, with some error, no 
matter how small. Therefore, if for very small increments of 
the parameters and of x, the values of p(x. pi, po,..., pr) could 
change considerably, the resulting solutions would be completely 
useless in practice. The only solutions useful for practical purposes 
are those for which, knowing the approximate values of the param- 
eters and of x, we may also find approximate values of the func- 
tion. The precise mathematical expression of this property is 
nothing more than the continuity of the function 9(x, pi, po, . - . , Pr) 
with respect to x, p1, po...., Pr. Now, we prove: 


SY, Prs Pes vee , Pr). 


THEOREM 3. [f f(x, y, Pi, P2,. ++ Pr) is continuous in a region of 
the (x, y) plane, and is also continuous with respect to each of the 
parameters Py, p2,... Pr, then the fulfillment of the conditions under 
which we proved the existence and uniqueness of a solution of (3) will 
insure that this solution is a continuous function Of X, Pis Poy» «+ sDe: 
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Since we can treat continuity with respect to each parameter 
separately, we may confine ourselves, for the sake of simplicity, to 
the case where the function f (and thus the function » also) 
depends upon only one parameter p. 


Proof. We may confine our attention to a closed and bounded 
neighborhood D of (Xo, )o). Then on D and with p restricted to the 
interval d = [Aj, Ae], fis uniformly continuous. By our assumption 
(A) we have 


IFO Yrs P) — f£(% 2, P)| < kl y1 — ye, (A’) 


for (x, yi) and (x, yz) in D and p € d. We assert that in this case 
the unique solution » = q(x. p) of the equation 


d 
= f(s P) 3”) 


which takes the value yo at x = Xo, is continuous with respect to x 
and p for (x, p) in the rectangle x9 —-a <x < xo +a and 
Ar <p <ro. 

To prove this, we shall have to return to the construction used 
earlier to prove the existence of a solution for equation (3). We in- 
vestigate the nature of the dependence on the parameter p 
of the functions »,(x) which we constructed there (and which we 
shall now write in the form @,(x, p)). Since the quantities ), 
defined by the recursive formula (5), also depend on p (that is, 
yi = vi(p)), formula (5) takes the form 


yilp) = yi-a(p) + fl xia, yi-1(P); Ph (11) 
By virtue of the uniform continuity of f(x,y, p), given any 
€ > O there exists a 6 > 0 such that 


If. yp +h)—f.y p)| <s, (12) 


whenever |h| <6, p€ d,p +h € d, and (x, y) € D. From (11) we 
have 


Yip + h) — yilp) 
= yi-i(p + 4) — yi-r(p) 
+ © {fxn yea(p +) p + A) — SLs, Yap) PI} 
where | <i<cn. 
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Let us write, for brevity, 
yipt+hy—-ylp)=i OSISn), 
so that 
Ai = Ain i {f[xi-1, Vi-1(p + A), p + AY — fi-1, vi-1(P), P]} 13) 


where | < i < n. The expression within the braces may be written 
in the form 


S[xi-1, Yi-1(p + A), p + A) — f[xi-1, Yi-1(p), P + AI 
+ f[xi-a, yi-1(p), p + A) — f[xi-1, vi-1(p), pl: 


By virtue of (A’), the absolute value of the first of these differences 
does not exceed &|A;_1|, and by (12), the absolute value of the sec- 
ond difference is less than e. Thus, we obtain 


A] < [Bia] +S (k[dial +e) = 24 [al (1+ Ala sis nn. 


Applying the same estimate to the quantity |A;_1| appearing on 
the right side, and repeating this process, we arrive at the relation 


2 i- 
| Ai] <4 (14 Sh) 4 (1 4 oh) ++ 2 (144) 


n n n 


“El Jselb- ey icp 
E ((i+4 ae a eg ee 
where 1 <i <n. Hence, for |h| < 6 we have 
lp + h) — yi(p)| <= (e* — D 
or, what is the same thing, 


| Pn(Xi, P + A) — Gal Xi, p)| Ce DEST, 2a lei <M. 


But the functions p(x, p) and (x, p + A) are linear on each 
of the subintervals [x;_1, x;]. Therefore, the inequality 


| Pn(x, p a h) = PrlX, P)| a pe — 1), 


which is satisfied (as we have just seen) for |h| <6 at the end 
points of every subinterval, must necessarily be satisfied within 
each subinterval, and thus in the entire interval [Xo — a, Xo + a]. In 
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view of the arbitrariness of e, this means that the family of func- 
tions p(x, p) is equicontinuous on the rectangle x» -a<x< 
x +a and Ay <p <hdz2. Let us denote this family by S. By the 
lemma on page 213 (see also the remark on page 215) the sequence 
S contains a subsequence S’ which converges uniformly in the rec- 
tangle Xo -a <x < Xo +a and Ay <p <>. to the solution 
g(x, p). Since the functions in the sequence are continuous with 
respect to x and p, it follows from the proposition on the continu- 
ity of the limit of a uniformly converging sequence of continuous 
functions (of any number of variables) that the same holds true for 
p(x, p), which is what we sought to prove. 


As we know, every solution (x) of equation (3) is, under our 
assumptions, uniquely determined by the value yo which it assumes 
atx = Xo. The function p(x) changes with any change of the numbers 
Xo and yo, so that, in essence, it is a function ¢(x, Xo, yo) of three 
independent variables. For the same reasons which we mentioned 
earlier, the nature of the dependence of the solution p(x, Xo, Yo) 
on these initial conditions xo and yo has an essential importance. 
At first glance. it may appear that this is a new problem, one which 
cannot be reduced to the problem considered above (the depend- 
ence of the solution on parameters), since the numbers xo and yo 
clearly do not appear as arguments of the function f(x, y). But 
such a reduction is actually possible. If in equation (3) we trans- 
form the independent variable, as well as the unknown function, 
with the aid of the relationships 


x=XxXo+x* and y=yot+)%, 
where x* is a new independent variable and y* a new unknown 
function, equation (3) takes the form 
ayn 
dx* 
It will then be necessary to find the solution of the equation which 


takes the value y* = 0 at x* = 0. Here, xo and yo explicitly appear 
as parameters on the right side of the equation. Let 





(xo + x*, yo + y*)- (3) 


y* = p*(x*, Xo; Yo) (14) 


be the solution of equation (3’”). It is continuous with respect to xo 
and yo by virtue of the theorem which we have just proved, since 
the function f(xo + x*, Yo + y*), being continuous with respect to 
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each of its two arguments, is automatically continuous with respect 
to xo and yo. But, in order to obtain the required solution of equa- 
tion (3), we clearly have only to pass from the new variables to the 
old ones in expression (14), which gives us 


y= p(x, Xo, Yo) =yot+ y*(x — Xo, Xo, Yo). 


from which it follows directly that the solution is continuous with 
respect to the initial values xo and yo. 


63. CHANGE OF VARIABLES 


As you know, one of the most effective methods for simplifying 
integration problems is the transformation of the variable of inte- 
gration (the so-called method of integration by substitution). This 
method also constitutes one of our most important devices for 
solving differential equations, and its flexibility is considerably in- 
creased by the fact that both the independent variable and the 
unknown function can be subjected to transformation. As we saw 
at the beginning of this lecture, the solution of a differential equa- 
tion can easily be reduced to quadratures if the variables can be 
separated, that is, if by multiplication and division the equation is 
reducible to the form M (y) dy + N(x) dx = 0, where each of the 
two terms contains only one of the variables x and y. The transfor- 
mation of variables is often useful in this regard. By transforming 
either the independent variable x or the unknown function y, 
or both simultaneously, we are frequently able to replace an equa- 
tion in which the variables could not be separated by a new equa- 
tion in which it has become possible to separate them. Although the 
class of differential equations which can be reduced to quadratures 
by this means is very limited, nevertheless it contains quite a num- 
ber of the simplest types, which are, by that very token, the most 
frequently encountered in practice. As a consequence, the method 
of transformation of variables acquires great practical importance. 

In the class of equations to which the method of transformation 
is applicable there are, first of all, all /inear first-order differential 
equations, that is, equations in which the unknown function y, as 
well as its derivative y’, appear in the first power only. The general 
form of such an equation is 


y+ Ay + f(x) = 0, (15) 
where /;(x) and /2(x) are given continuous functions of the variable 
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x. All equations of this type can be reduced by a single method to 
a form in which the variables are separable. 


To see this, let us consider first the equation 
z+ falz = 0, (16) 


where z denotes the unknown function. This is an equation of the 
same form as (15), except that it is without a free term fo(x). The 
variables in this equation can be separated immediately: 


2 ep Cyde= 0; 
and integration gives 


In 





S| +f filu) du = 0. 
<0 ro 


For our purposes, it is enough to have any one solution of equation 
(16). We therefore set zo = 1, obtaining 


In|z| +f Ace) du = 0, 


MERE: = f “falu) du 

Z=e Ym =x); (17) 
Thus, we find a solution of equation (16) with the aid of one 
quadrature. 


To reduce the general equation (15) to quadratures, we shall 
transform the unknown function y by the substitution y = p(x) y*, 
where y* is a new unknown function and g(x) is the solution 
of (16) given by formula (17). Equation (15) then takes the form 


P(x) y* + PO)Y* + ACPA) Y* + for) = 0, 
glx)y*” + folx) = 9, 
since, from equation (16), we have 


gp (x) + falx)p(x) = 0. 


or 


Hence, : 
x flu) du 
Di == ey = a Giels 2 
and, consequently, 


gS), ‘fa(vjela™ dy + C, 


where C is a constant of integration. 
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Finally, 
y= (x)y* = ei a [fipecereda ™ dv + |, 
This is the general solution of equation (15), from which, by appro- 


priate choice of the constant C, we can obtain all its particular so- 
lutions. If, for example, we wish to have y = yo for x = Xo, then, 


from the general solution, we find that yp = —C and the particular 
solution takes the form 
ys a yo —f * fo(uye In dv], (18) 


We see, then, that with the aid of an appropriately selected 
transformation of the unknown function, the solution of equation 
(15) is reduced to two successive quadratures. As an example, we 
shall carry out completely the solution of the problem which 
we considered at the beginning of this lecture. At that time we 
obtained the equation 

} + sy = sce! = 0, 
Here, f(t) = s and fo(t) = —sce-s'. Moreover, fo = 0 and yo = 0, 
since, at the moment / = 0, the second vessel does not contain any 
salt. Thus, we have 


t 
{Aco du = St, 
and formula (18) gives 
, ; 

Sa elo +f sce-serido| = sce se (19) 
This formula completely solves the given problem. Upon examin- 
ing formula (19), we readily see that the amount of salt in 
the second container at first increases and then, from the moment 
t= re begins to decrease, tending to zero as f—> oo. At the moment 

a . . . . : . 

f= b° the amount of salt in this container is at its maximum 


c : : 
se Remarkably, this maximum amount does not depend on either 
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aor b. On the other hand, the time interval required to reach the 
maximum salt concentration in the second container depends on a 
and 6, but not at all onc. 

All these and many other characteristics of the phenomenon 
under consideration become evident from the study of the function 
(19). We see that the solution of the differential equation actually 
permits us to obtain all the necessary information about the course 
of the process as a whole, while the differential equation itself gives 
us only instantaneous (local) relations among the quantities taking 
part in this process. 

Another frequently encountered type of equation, which can be 
reduced by a simple transformation of the unknown function to an 
equation with separable variables, is the so-called homogeneous dif- 
ferential equation of the general form 


@ = /(2). (20) 


To this type belong, in particular, the frequently encountered equa- 
tions of the form 


P(x, y) dy = Q(x, y) dx, 


where P(x, y) and Q(x, y) are homogeneous polynomials of the 
same degree n. For, after division by x”, such a homogeneous poly- 


nomial becomes a polynomial with respect to the variable a Con- 


sequently, the numerator and the denominator on the right side of 
the relation 


dy _ Ax,y) _ Wy)" 
dx ~ P(x,y) P(x, y)/x" 


are polynomials with respect to which means that the whole 


right side is a rational function of this ratio. 
Transforming the unknown function in equation (20) by means 
of the substitution y = xy*, we can write the equation in the form 
Ie = fy): 


whence, 


BP AOI Ae Oe nie 


I = : 
dx x SO*) -y* x 
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The variables have now been separated. Integrating, we easily find 
an expression for x in terms of y*. If this expression allows us to 
express y* as a Single-valued function of x, then we also obtain an 
expression for y in terms of x. In the general case, this relation de- 
termines y*, and, hence, also y, as an implicit function of x. 


64. SYSTEMS OF EQUATIONS OF HIGHER ORDERS 


In conclusion, we shall touch briefly on systems of first-order 
differential equations and on equations of higher order. 

Suppose that we have a system of n first-order differential equa- 
tions containing » unknown functions yi, y2,...,)¥n of one in- 
dependent variable x. Clearly, we may regard as a solution of such 
a system any system of functions y; = gi{x) (1 < i < n) satisfying 
all the given equations. Let us assume that the given system of 


equations has been solved for the derivatives a (1 <i <n), and 


therefore has the form 
ay; . 
= f(X Vi Vo ean) dd — I = n). (21) 


Here again, it will be convenient to call a set of corresponding 
values of the variables x, y1, y2,...,n a point. This is, of course, a 
point in a space of n + | dimensions. 

Throughout our exposition, we shall assume the continuity of 
all functions /; in a region D of this space. The geometrical repre- 
sentation of any system of functions »; = pi(x) (1 <i < n) in this 
space is a curve; and if the functions in question constitute a solu- 
tion of a system of equations of the form (21), we may call 
the curve representing them an integral curve of this system. 

This geometrical terminology is extremely advantageous here, 
not only because of its inherently descriptive nature, but also (and 
chiefly) because with its aid many formulations and arguments 
can be carried out and presented in the same form and terminology 
as would be used for a single equation. ‘ 

Thus, we can formulate the fundamental theorem on the exist- 
ence of solutions quite concisely as follows: 


THEOREM 4. Through every point within the region D there passes 
at least one integral curve. 
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Analytically this means that for any system of n + | numbers 
Xo, V1, yo, .... y, within D there exists a system of functions 
Yi = V(X) (1 Si <n) satisfying the system of equations (21) ina 
certain interval vp — a < x < xp + a, as well as satisfying the con- 
dition pi(xo) = yi. The proof of this theorem is only slightly more 
complicated than in the case of one equation. It is based on a 
lemma which is completely analogous to the one we employed 
earlier. This lemma may either be proved directly or be deduced as 
a corollary from our previous lemma. The latter method is particu- 
larly simple. Here, the elements of the family are not single func- 
tions, but systems {Fy(x), Fo(x),..., Fa(x)} = S of n functions; and 
we have to prove that, under the assumptions of boundedness and 
equicontinuity for the collection of all functions belonging to any 
system of the given family, we can select a sequence of systems 
S1, Se,..., uniformly convergent in the given interval. This means, 
setting 


Sk = { Fix(x), For(x)...-. Fae (x)} 

(kA = 1,2,...), that we shall obtain n sequences of functions 
Fils)? ee. 22d 

each of which is uniformly convergent in the given interval. 


Proof of the extended lemma. We construct the proof in the fol- 
lowing way: first, on the basis of our previous lemma, we can find 
a sequence of systems S; such that the sequence Fy,(x) converges 
uniformly in the given interval. From this sequence of systems we 
can, again by our previous lemma, extract a subsequence in which 
the sequence of second functions Fo;,(x) also converges uniformly 
in the given interval. Repeating this procedure n times, we clearly 
arrive at a sequence of systems in which all n sequences Fjx(x) 
(1 <i <n) converge uniformly in the given interval, and thus the 
proof of our new, extended lemma is completed. 


Proof of Theorem 4. This is now carried out in strict analogy 
with our previous reasoning, and it is most useful to adopt our 
geometrical approach as a guide. Again, we construct a set of 
broken lines with segments of arbitrarily small lengths. From this 
set, using the lemma just proved, we select a sequence converging 
uniformly to a certain curve, which we can then prove by the same 
method as before (with minor and self-evident modifications of a 
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purely technical nature) to be an integral curve of the system of 
equations (21). This curve passes through the given point 
(Xo, ¥1, yo, ..., ¥n), since all the broken lines pass through this 
point. 

The uniqueness of the integral curve passing through a given 
point can be proved, as before, only after imposing certain addi- 
tional conditions on the functions f;. The simplest form of these 
conditions is completely analogous to condition (A) on page 220 
and reduces to the requirement that the inequalities 


Li, yi, ye, tes Yn) — fi(x, v1, yo, . Yn?) | 
Sk > Ly — yi] (L<isny, 


where & is some positive constant, be satisfied i in the entire region D. 


We have no time left to consider equations of higher order 
systematically. We shall only show that the solution of any equation 


Fax, y, yy", ..-, yr) = 0 (22) 


of order n can be reduced to the solution of a system of first-order 
equations. 

Let us consider a system of n first-order equations with un- 
known functions yi, ye, ..., ¥n of the following form: 


F(X Yu Yar - 6-1 Yat Ym Yn) = 0, 


yi = y2, 
ye’ = ys, (23) 
Yn-1 = Vn- 


And let us suppose that we have found a solution 


v=pO) Usisn) 
of this system of equations. By equations (23). we shall then have 
$1 (xX) = e(x) = yo, 
Qi (X) = Ge'(X) = H3(X) = Y3, 
p(X) = pe2"(x) = 93’) = pa(X) = ys, 
pi Mx) = qal™2(x) ++ = PAC) = Ym 
p(x) = y 
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Consequently, the first of the equations (23) yields 


F[x. G1(x), G10), ©. F1MOX)] = 9, 


that is, y = ¢i(X) is a solution of equation (22). 
Conversely, let the function y = ¢(x) satisfy the equation (22). 
Setting 


y= (x). V2 = g(x), Lose a= pr1(x), 


we see, immediately, that the system of functions (1, ye, .. .5)’n) 
constitutes a solution of the system of equations (23). Thus, the 
problems of finding all the solutions of equation (22) and all those 
of the system of equations (23) are completely reducible to 
each other. 
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